I spent one day solving this problem and couldn't find a solution to pass the large dataset.
Problem
An n parentheses sequence consists of n "("s and n ")"s.
Now, we have all valid n parentheses sequences. Find the k-th smallest sequence in lexicographical order.
For example, here are all valid 3 parentheses sequences in lexicographical order:
((()))
(()())
(())()
()(())
()()()
Given n and k, write an algorithm to give the k-th smallest sequence in lexicographical order.
For large data set: 1 ≤ n ≤ 100 and 1 ≤ k ≤ 10^18
This problem can be solved by using dynamic programming
Let dp[n][m] = number of valid parentheses that can be created if we have n open brackets and m close brackets.
Base case:
dp[0][a] = 1 (a >=0)
Fill in the matrix using the base case:
dp[n][m] = dp[n - 1][m] + (n < m ? dp[n][m - 1]:0 );
Then, we can slowly build the kth parentheses.
Start with a = n open brackets and b = n close brackets and the current result is empty
while(k is not 0):
If number dp[a][b] >= k:
If (dp[a - 1][b] >= k) is true:
* Append an open bracket '(' to the current result
* Decrease a
Else:
//k is the number of previous smaller lexicographical parentheses
* Adjust value of k: `k -= dp[a -1][b]`,
* Append a close bracket ')'
* Decrease b
Else k is invalid
Notice that open bracket is less than close bracket in lexicographical order, so we always try to add open bracket first.
Let S= any valid sequence of parentheses from n( and n) .
Now any valid sequence S can be written as S=X+Y where
X=valid prefix i.e. if traversing X from left to right , at any point of time, numberof'(' >= numberof')'
Y=valid suffix i.e. if traversing Y from right to left, at any point of time, numberof'(' <= numberof')'
For any S many pairs of X and Y are possible.
For our example: ()(())
`()(())` =`empty_string + ()(())`
= `( + )(())`
= `() + (())`
= `()( + ())`
= `()(( + ))`
= `()(() + )`
= `()(()) + empty_string`
Note that when X=empty_string, then number of valid S from n( and n)= number of valid suffix Y from n( and n)
Now, Algorithm goes like this:
We will start with X= empty_string and recursively grow X until X=S. At any point of time we have two options to grow X, either append '(' or append ')'
Let dp[a][b]= number of valid suffixes using a '(' and b ')' given X
nop=num_open_parenthesis_left ncp=num_closed_parenthesis_left
`calculate(nop,ncp)
{
if dp[nop][ncp] is not known
{
i1=calculate(nop-1,ncp); // Case 1: X= X + "("
i2=((nop<ncp)?calculate(nop,ncp-1):0);
/*Case 2: X=X+ ")" if nop>=ncp, then after exhausting 1 ')' nop>ncp, therefore there can be no valid suffix*/
dp[nop][ncp]=i1+i2;
}
return dp[nop][ncp];
}`
Lets take example,n=3 i.e. 3 ( and 3 )
Now at the very start X=empty_string, therefore
dp[3][3]= number of valid sequence S using 3( and 3 )
= number of valid suffixes Y from 3 ( and 3 )
Related
I am having a M character, from these character i need to make a sequence of length N such that no two consecutive character are same and also first and last character of the sequence is fix. So i need to find the total number of ways.
My Approach:
Dynamic programming.
If first and last character are '0' and '1'
dp[1][0]=1 , dp[1][1]=1
for(int i=2;i<N;i++)
for(int j=0;j<M;j++)
for(int k=0;k<M;k++)
if(j!=k) dp[i][j]+=dp[i-1][k]
So final answer would summation dp[n-1][i] , i!=1
Problem:
Here length N is too large around 10^15 and M is around 128, how find the number of permutation without using arrays ?
Assume M is fixed. Let D(n) be the number of sequences of length n with no repeated characters where the first and last character differ (but are fixed). Let S(n) be the number of sequences of length n where the first and last characters are the same (but are fixed).
For example, D(6) is the number of strings of the form a????b (for some a and b -- noting that for counting it doesn't matter which two characters we chose, and where the ? represent other characters). Similarly, S(6) is the number of strings of the form a????a.
Consider a sequence of length n>3 of the form a....?b. The ? can be any of m-1 characters (anything except b). One of these is a. So D(n) = S(n-1) + (m-2)D(n-1). Using a similar argument, one can figure out that S(n) = (M-1)D(n-1).
For example, how many strings are there of the form a??b? Well, the character just before the b could be a or something else. How many strings are there when it's a? Well, it's the same as the number of strings of the form a?a. How many strings are there when it's something else? Well it's the same as the number of strings of the form a?c multiplied by the number of choices we had for c (namely: m-2 -- everything except for a which we've already counted, and b which is excluded by the rules).
If n is odd, we can consider the middle character. Consider a sequence of length n of the form a...?...b. The ? (which is in the center of the string) can be a, b, or one of the other M-2 characters. Thus D(2n+1) = S(n+1)D(n+1) + D(n+1)S(n+1) + (M-2)D(n+1)D(n+1). Similarly, S(2n+1) = S(n+1)S(n+1) + (M-1)D(n+1)D(n+1).
For small n, S(2)=0, S(3)=M-1, D(2)=1, D(3)=M-2.
We can use the above equations (the first set for even n>3, the second set for odd n>3, and the base cases for n=2 or 3 to compute the result you need in O(log N) arithmetic operations. Presumably the question asks you to compute the result modulo something (since the result grows like O(M^(N-2)), but that's easy to incorporate into the results.
Working code that uses this approach:
def C(n, m, p):
if n == 2:
return 0, 1
if n == 3:
return (m-1)%p, (m-2)%p
if n % 2 == 0:
S, D = C(n-1, m, p)
return ((m-1) * D)%p, (S + (m-2) * D)%p
else:
S, D = C((n-1)//2+1, m, p)
return (S*S + (m-1)*D*D)%p, (2*S*D + (m-2)*D*D)%p
Note that in this code, C(n, m, p) returns two numbers -- S(n)%p and D(n)%p.
For example:
>>> p = 2**64 - 59 # Some large prime
>>> print(C(4, 128, p))
>>> print(C(5, 128, p))
>>> print(C(10**15, 128, p))
(16002, 16003)
(2032381, 2032380)
(12557489471374801501, 12557489471374801502)
Looking at these examples, it seems like D(n) = S(n) + (-1)^n. If that's true, the code can be simplified a bit I guess.
Another, perhaps easier, way to do it efficiently is to use a matrix and the first set of equations. (Sorry for the ascii art -- this diagram is a vector = matrix * vector):
(D(n)) = (M-2 1) * (D(n-1))
(S(n)) = (M-1 0) (S(n-1))
Telescoping this, and using that D(2)=1, S(2)=0:
(D(n)) = (M-2 1)^(n-2) (1)
(S(n)) = (M-1 0) (0)
You can perform the matrix power using exponentiation by squaring in O(log n) time.
Here's working code, including the examples (which you can check produce the same values as the code above). Most of the code is actually matrix multiply and matrix power -- you can probably replace a lot of it with numpy code if you use that package.
def mat_mul(M, N, p):
R = [[0, 0], [0, 0]]
for i in range(2):
for j in range(2):
for k in range(2):
R[i][j] += M[i][k] * N[k][j]
R[i][j] %= p
return R
def mat_pow(M, n, p):
if n == 0:
return [[1, 0], [0, 1]]
if n == 1:
return M
if n % 2 == 0:
R = mat_pow(M, n//2, p)
return mat_mul(R, R, p)
return mat_mul(M, mat_pow(M, n-1, p), p)
def Cmat(n, m, p):
M = [((m-2), 1), (m-1, 0)]
M = mat_pow(M, n-2, p)
return M[1][0], M[0][0]
p = 2**64 - 59
print(Cmat(4, 128, p))
print(Cmat(5, 128, p))
print(Cmat(10**15, 128, p))
You only need to count the number of acceptable sequences, not find them explicitly. It turns out that it doesn't matter what the majority of the characters are. There are only 4 kinds of characters that matter:
The first character
The last character
The last-used character, so you don't repeat characters consecutively
All other characters
In other words, you don't need to iterate over all 10^15 characters. You only need to consider the four cases above, since most characters can be lumped together into the last case.
Given a number n of x digits. How to remove y digits in a way the remaining digits results in the greater possible number?
Examples:
1)x=7 y=3
n=7816295
-8-6-95
=8695
2)x=4 y=2
n=4213
4--3
=43
3)x=3 y=1
n=888
=88
Just to state: x > y > 0.
For each digit to remove: iterate through the digits left to right; if you find a digit that's less than the one to its right, remove it and stop, otherwise remove the last digit.
If the number of digits x is greater than the actual length of the number, it means there are leading zeros. Since those will be the first to go, you can simply reduce the count y by a corresponding amount.
Here's a working version in Python:
def remove_digits(n, x, y):
s = str(n)
if len(s) > x:
raise ValueError
elif len(s) < x:
y -= x - len(s)
if y <= 0:
return n
for r in range(y):
for i in range(len(s)):
if s[i] < s[i+1:i+2]:
break
s = s[:i] + s[i+1:]
return int(s)
>>> remove_digits(7816295, 7, 3)
8695
>>> remove_digits(4213, 4, 2)
43
>>> remove_digits(888, 3, 1)
88
I hesitated to submit this, because it seems too simple. But I wasn't able to think of a case where it wouldn't work.
if x = y we have to remove all the digits.
Otherwise, you need to find maximum digit in first y + 1 digits. Then remove all the y0 elements before this maximum digit. Then you need to add that maximum to the answer and then repeat that task again, but you need now to remove y - y0 elements now.
Straight forward implementation will work in O(x^2) time in the worst case.
But finding maximum in the given range can be done effectively using Segment Tree data structure. Time complexity will be O(x * log(x)) in the worst case.
P. S. I just realized, that it possible to solve in O(x) also, using the fact, that exists only 10 digits (but the algorithm maybe a little bit complicated). We need to find the minimum in the given range [L, R], but the ranges in this task will "change" from left to the right (L and R always increase). And we just need to store 10 pointers to the digits (1 per digit) to the first position in the number such that position >= L. Then to find the minimum, we need to check only 10 pointers. To update the pointers, we will try to move them right.
So the time complexity will be O(10 * x) = O(x)
Here's an O(x) solution. It builds an index that maps (i, d) to j, the smallest number > i such that the j'th digit of n is d. With this index, one can easily find the largest possible next digit in the solution in O(1) time.
def index(digits):
next = [len(digits)+1] * 10
for i in xrange(len(digits), 0, -1):
next[ord(digits[i-1])-ord('0')] = i-1
yield next[::-1]
def minseq(n, y):
n = str(n)
idx = list(index(n))[::-1]
i, r = 0, []
for ry in xrange(len(n)-y):
i = next(j for j in idx[i] if j <= y+ry) + 1
r.append(n[i - 1])
return ''.join(r)
print minseq(7816295, 3)
print minseq(4213, 2)
Pseudocode:
Number.toDigits().filter (sortedSet (Number.toDigits()). take (y))
Imho you don't need to know x.
For efficiency, Number.toDigits () could be precalculated
digits = Number.toDigits()
digits.filter (sortedSet (digits).take (y))
Depending on language and context, you either output the digits and are done or have to convert the result into a number again.
Working Scala-Code for example:
def toDigits (l: Long) : List [Long] = if (l < 10) l :: Nil else (toDigits (l /10)) :+ (l % 10)
val num = 734529L
val dig = toDigits (num)
dig.filter (_ > ((dig.sorted).take(2).last))
A sorted set is a set which is sorted, which means, every element is only contained once and then the resulting collection is sorted by some criteria, for example numerical ascending. => 234579.
We take two of them (23) and from that subset the last (3) and filter the number by the criteria, that the digits have to be greater than that value (3).
Your question does not explicitly say, that each digit is only contained once in the original number, but since you didn't give a criterion, which one to remove in doubt, I took it as an implicit assumption.
Other languages may of course have other expressions (x.sorted, x.toSortedSet, new SortedSet (num), ...) or lack certain classes, functions, which you would have to build on your own.
You might need to write your own filter method, which takes a pedicate P, and a collection C, and returns a new collection of all elements which satisfy P, P being a Method which takes one T and returns a Boolean. Very useful stuff.
I am writing code to find nth Ramanujan-Hardy number. Ramanujan-Hardy number is defined as
n = a^3 + b^3 = c^3 + d^3
means n can be expressed as sum of two cubes.
I wrote the following code in haskell:
-- my own implementation for cube root. Expected time complexity is O(n^(1/3))
cube_root n = chelper 1 n
where
chelper i n = if i*i*i > n then (i-1) else chelper (i+1) n
-- It checks if the given number can be expressed as a^3 + b^3 = c^3 + d^3 (is Ramanujan-Hardy number?)
is_ram n = length [a| a<-[1..crn], b<-[(a+1)..crn], c<-[(a+1)..crn], d<-[(c+1)..crn], a*a*a + b*b*b == n && c*c*c + d*d*d == n] /= 0
where
crn = cube_root n
-- It finds nth Ramanujan number by iterating from 1 till the nth number is found. In recursion, if x is Ramanujan number, decrement n. else increment x. If x is 0, preceding number was desired Ramanujan number.
ram n = give_ram 1 n
where
give_ram x 0 = (x-1)
give_ram x n = if is_ram x then give_ram (x+1) (n-1) else give_ram (x+1) n
In my opinion, time complexity to check if a number is Ramanujan number is O(n^(4/3)).
On running this code in ghci, it is taking time even to find 2nd Ramanujan number.
What are possible ways to optimize this code?
First a small clarification of what we're looking for. A Ramanujan-Hardy number is one which may be written two different ways as a sum of two cubes, i.e. a^3+b^3 = c^3 + d^3 where a < b and a < c < d.
An obvious idea is to generate all of the cube-sums in sorted order and then look for adjacent sums which are the same.
Here's a start - a function which generates all of the cube sums with a given first cube:
cubes a = [ (a^3+b^3, a, b) | b <- [a+1..] ]
All of the possible cube sums in order is just:
allcubes = sort $ concat [ cubes 1, cubes 2, cubes 3, ... ]
but of course this won't work since concat and sort don't work
on infinite lists.
However, since cubes a is an increasing sequence we can sort all of
the sequences together by merging them:
allcubes = cubes 1 `merge` cubes 2 `merge` cubes 3 `merge` ...
Here we are taking advantage of Haskell's lazy evaluation. The definition
of merge is just:
merge [] bs = bs
merge as [] = as
merge as#(a:at) bs#(b:bt)
= case compare a b of
LT -> a : merge at bs
EQ -> a : b : merge at bt
GT -> b : merge as bt
We still have a problem since we don't know where to stop. We can solve that
by having cubes a initiate cubes (a+1) at the appropriate time, i.e.
cubes a = ...an initial part... ++ (...the rest... `merge` cubes (a+1) )
The definition is accomplished using span:
cubes a = first ++ (rest `merge` cubes (a+1))
where
s = (a+1)^3 + (a+2)^3
(first, rest) = span (\(x,_,_) -> x < s) [ (a^3+b^3,a,b) | b <- [a+1..]]
So now cubes 1 is the infinite series of all the possible sums a^3 + b^3 where a < b in sorted order.
To find the Ramanujan-Hardy numbers, we just group adjacent elements of the list together which have the same first component:
sameSum (x,a,b) (y,c,d) = x == y
rjgroups = groupBy sameSum $ cubes 1
The groups we are interested in are those whose length is > 1:
rjnumbers = filter (\g -> length g > 1) rjgroups
Thre first 10 solutions are:
ghci> take 10 rjnumbers
[(1729,1,12),(1729,9,10)]
[(4104,2,16),(4104,9,15)]
[(13832,2,24),(13832,18,20)]
[(20683,10,27),(20683,19,24)]
[(32832,4,32),(32832,18,30)]
[(39312,2,34),(39312,15,33)]
[(40033,9,34),(40033,16,33)]
[(46683,3,36),(46683,27,30)]
[(64232,17,39),(64232,26,36)]
[(65728,12,40),(65728,31,33)]
Your is_ram function checks for a Ramanujan number by trying all values for a,b,c,d up to the cuberoot, and then looping over all n.
An alternative approach would be to simply loop over values for a and b up to some limit and increment an array at index a^3+b^3 by 1 for each choice.
The Ramanujan numbers can then be found by iterating over non-zero values in this array and returning places where the array content is >=2 (meaning that at least 2 ways have been found of computing that result).
I believe this would be O(n^(2/3)) compared to your method that is O(n.n^(4/3)).
I want to solve UVA 10298 -"Power Strings" problem using KMP algorithm. In this blog a technique is shown how failure function can be used to calculate minimum length repeated substring. The technique is as follows:
Compute prefix-suffix table pi[ ] for the given string.
Let len be the string length, and last_in_pi be the value stored at the last index of pi table.
Check whether len % (len - last_in_pi) == 0 is true or not. If it is true then the length of the minimum length repeated substring is (len - last_in_pi), otherwise it is the length of the given string.
I understand what is failure function and how it is used to find pattern in a text but I am struggling to understand proof of correctness of this technique.
Remember that Pi[i] is defined as the (length of the) longest prefix of your_string that is a proper suffix (so not the whole string) of the substring your_string[0 ... i].
There is an example on the blog post you linked to:
0 1 2 3 4 5
S : a b a b a b
Pi: 0 0 1 2 3 4
Where we have:
a b a
a b a b
Etc. I hope this makes it clear what Pi (the prefix function / table) does.
Now, the blog says:
The last value of prefix table = 4..
Now If it is a repeated string than , It’s minimal length would be 2. (6(string length) – 4) , Now
So you have to check if len % (len - last_in_pi) == 0. If yes, then len - last_in_pi is the length of the shortest repeated string (the period string).
This works because, if you rotate a string with len(period) positions either way, it will match itself. len - last_in_pi tells you how much you'd need to rotate.
Problem
S (of length Ls) is the given string. M (of length Lm) is the largest proper suffix of S, which is also a prefix of S. We have to prove Ls - Lm is the length of the shortest period of S.
Proof by Contradiction
Let's say there were a period Y whose length Ly < Ls - Lm (i.e, it's shorter than the one the above technique gives).
An important property to note is that M is a proper prefix of Y or vice-versa depending on their lengths. We can denote this as M = n*Y + Z, where n >= 0 and Z is the additional part and Lz < Ly. Z forms a prefix to Y, since Y repeats itself. Let Y = Z + W.
Consider M the suffix. Append the previous Ly number of characters from the original string S to it. This won't exceed the string length because (Ly < Ls - Lm). The new suffix is (n + 1)*Y + Z.
Consider M the prefix. Now append the next Ly number of characters from the original string S to it. The new prefix here is
M + (next Ly characters from S)
- > n*Y + Z + (Ly characters)
- > n*Y + Z + (Ly - Lz characters) + (Lz characters)
- > n*Y + (Z + W) + (Z)
{The `Ly - Lz` characters should be `W` because `Z` and these together form `Y`; The last Lz characters are actually the the first Lz characters of Y which is nothing but Z}
- > (n + 1)*Y + Z
Now we have a proper suffix of S which is also a prefix and is greater than M. But we started off saying M is the longest proper suffix which is also a prefix. So it's a contradiction, implying such a Y can not exist.
Assume you have a string s of size n, which looks like s = x1x2x3...x[n-2]x[n-1]x[n]
Assume s has a maximum common prefix/suffix of length len
Then it's period is p = (n - len), iff n % p == 0
Induction:
Denote prefix = s[1...len], postfix = s[p+1...n]
Then we have prefix[1...p] == postfix[1...p] == s[p+1...2p]
Since s[p+1...2p] == prefix[p+1...2p] so postfix[1...p] == postfix[p+1...2p]
Recursively postfix[p+1...2p] == s[2p+1...3p] == prefix[2p+1...3p]
...
Let me start with an example -
I have a range of numbers from 1 to 9. And let's say the target number that I want is 29.
In this case the minimum number of operations that are required would be (9*3)+2 = 2 operations. Similarly for 18 the minimum number of operations is 1 (9*2=18).
I can use any of the 4 arithmetic operators - +, -, / and *.
How can I programmatically find out the minimum number of operations required?
Thanks in advance for any help provided.
clarification: integers only, no decimals allowed mid-calculation. i.e. the following is not valid (from comments below): ((9/2) + 1) * 4 == 22
I must admit I didn't think about this thoroughly, but for my purpose it doesn't matter if decimal numbers appear mid-calculation. ((9/2) + 1) * 4 == 22 is valid. Sorry for the confusion.
For the special case where set Y = [1..9] and n > 0:
n <= 9 : 0 operations
n <=18 : 1 operation (+)
otherwise : Remove any divisor found in Y. If this is not enough, do a recursion on the remainder for all offsets -9 .. +9. Offset 0 can be skipped as it has already been tried.
Notice how division is not needed in this case. For other Y this does not hold.
This algorithm is exponential in log(n). The exact analysis is a job for somebody with more knowledge about algebra than I.
For more speed, add pruning to eliminate some of the search for larger numbers.
Sample code:
def findop(n, maxlen=9999):
# Return a short postfix list of numbers and operations
# Simple solution to small numbers
if n<=9: return [n]
if n<=18: return [9,n-9,'+']
# Find direct multiply
x = divlist(n)
if len(x) > 1:
mults = len(x)-1
x[-1:] = findop(x[-1], maxlen-2*mults)
x.extend(['*'] * mults)
return x
shortest = 0
for o in range(1,10) + range(-1,-10,-1):
x = divlist(n-o)
if len(x) == 1: continue
mults = len(x)-1
# We spent len(divlist) + mults + 2 fields for offset.
# The last number is expanded by the recursion, so it doesn't count.
recursion_maxlen = maxlen - len(x) - mults - 2 + 1
if recursion_maxlen < 1: continue
x[-1:] = findop(x[-1], recursion_maxlen)
x.extend(['*'] * mults)
if o > 0:
x.extend([o, '+'])
else:
x.extend([-o, '-'])
if shortest == 0 or len(x) < shortest:
shortest = len(x)
maxlen = shortest - 1
solution = x[:]
if shortest == 0:
# Fake solution, it will be discarded
return '#' * (maxlen+1)
return solution
def divlist(n):
l = []
for d in range(9,1,-1):
while n%d == 0:
l.append(d)
n = n/d
if n>1: l.append(n)
return l
The basic idea is to test all possibilities with k operations, for k starting from 0. Imagine you create a tree of height k that branches for every possible new operation with operand (4*9 branches per level). You need to traverse and evaluate the leaves of the tree for each k before moving to the next k.
I didn't test this pseudo-code:
for every k from 0 to infinity
for every n from 1 to 9
if compute(n,0,k):
return k
boolean compute(n,j,k):
if (j == k):
return (n == target)
else:
for each operator in {+,-,*,/}:
for every i from 1 to 9:
if compute((n operator i),j+1,k):
return true
return false
It doesn't take into account arithmetic operators precedence and braces, that would require some rework.
Really cool question :)
Notice that you can start from the end! From your example (9*3)+2 = 29 is equivalent to saying (29-2)/3=9. That way we can avoid the double loop in cyborg's answer. This suggests the following algorithm for set Y and result r:
nextleaves = {r}
nops = 0
while(true):
nops = nops+1
leaves = nextleaves
nextleaves = {}
for leaf in leaves:
for y in Y:
if (leaf+y) or (leaf-y) or (leaf*y) or (leaf/y) is in X:
return(nops)
else:
add (leaf+y) and (leaf-y) and (leaf*y) and (leaf/y) to nextleaves
This is the basic idea, performance can be certainly be improved, for instance by avoiding "backtracks", such as r+a-a or r*a*b/a.
I guess my idea is similar to the one of Peer Sommerlund:
For big numbers, you advance fast, by multiplication with big ciphers.
Is Y=29 prime? If not, divide it by the maximum divider of (2 to 9).
Else you could subtract a number, to reach a dividable number. 27 is fine, since it is dividable by 9, so
(29-2)/9=3 =>
3*9+2 = 29
So maybe - I didn't think about this to the end: Search the next divisible by 9 number below Y. If you don't reach a number which is a digit, repeat.
The formula is the steps reversed.
(I'll try it for some numbers. :) )
I tried with 2551, which is
echo $((((3*9+4)*9+4)*9+4))
But I didn't test every intermediate result whether it is prime.
But
echo $((8*8*8*5-9))
is 2 operations less. Maybe I can investigate this later.