Rabin-Karp constant-time substring hash lookup - algorithm

Given a string S[0..N-1], I want to be able to get the hash of any of its substrings S[i..j] in O(1) time, with O(N) preprocessing. Here's what I have so far (in Python):
BASE = 31
word = "abcxxabc"
N = len(word)
powers = []
prefix_hashes = []
b = 1
h = 0
for i in range(N):
powers.append(b)
b *= BASE
h += (ord(word[i]) - ord('a') + 1)
h *= BASE
prefix_hashes.append(h)
def get_hash(i, j):
if i == 0:
return prefix_hashes[j]
return prefix_hashes[j] - prefix_hashes[i - 1] * powers[j - i + 1]
It works very well... in Python, where I don't need to worry about possible integer overflow. I want to be able to perform above operations (effectively rewrite the whole algorithm), but modulo some big prime, so that I fit in 32-bit integer arithmetic. Here's what I came up with:
MOD = 10**9 + 7
BASE = 31
word = "abcxxabc"
N = len(word)
powers = []
prefix_hashes = []
b = 1
h = 0
for i in range(N):
powers.append(b)
b = (b * BASE) % MOD
h += (ord(word[i]) - ord('a') + 1)
h = (h * BASE) % MOD
prefix_hashes.append(h)
def get_hash(i, j):
if i == 0:
return prefix_hashes[j]
return (prefix_hashes[j] - prefix_hashes[i - 1] * powers[j - i + 1]) % MOD
But the
prefix_hashes[i - 1] * powers_of_base[j - i + 1]
part can overflow quite easily for larger MOD values. How to go about it?

Related

Explanation for Booth's Algorithm for Lexicographically minimal string rotation

Can someone please explain or comment me this code for Booth's Algorithm for Lexicographically minimal string rotation from wikipedia (https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation#Booth.27s_Algorithm)?
def least_rotation(S: str) -> int:
"""Booth's algorithm."""
S += S # Concatenate string to it self to avoid modular arithmetic
f = [-1] * len(S) # Failure function
k = 0 # Least rotation of string found so far
for j in range(1, len(S)):
sj = S[j]
i = f[j - k - 1]
while i != -1 and sj != S[k + i + 1]:
if sj < S[k + i + 1]:
k = j - i - 1
i = f[i]
if sj != S[k + i + 1]: # if sj != S[k+i+1], then i == -1
if sj < S[k]: # k+i+1 = k
k = j
f[j - k] = -1
else:
f[j - k] = i + 1
return k
I am lost with the indices and the k etc. What does it mean, why is there [j - k - 1] indexing etc.?
Thanks!

How can I solve this problem using dynamic programming?

Given a list of numbers, say [4 5 2 3], I need to maximize the sum obtained according to the following set of rules:
I need to select a number from the list and that number will be removed.
Eg. selecting 2 will have the list as [4 5 3].
If the number to be removed has two neighbours then I should get the result of this selection as the product of the currently selected number with one of its neighbours and this product summed up with the other neighbour. eg.: if I select 2 then I can have the result of this selction as 2 * 5 + 3.
If I select a number with only one neighbour then the result is the product of the selected number with its neighbour.
When their is only one number left then it is just added to the result till now.
Following these rules, I need to select the numbers in such an order that the result is maximized.
For the above list, if the order of selction is 4->2->3->5 then the sum obtained is 53 which is the maximum.
I am including a program which lets you pass as input the set of elements and gives all possible sums and also indicates the max sum.
Here's a link.
import itertools
l = [int(i) for i in input().split()]
p = itertools.permutations(l)
c, cs = 1, -1
mm = -1
for i in p:
var, s = l[:], 0
print(c, ':', i)
c += 1
for j in i:
print(' removing: ', j)
pos = var.index(j)
if pos == 0 or pos == len(var) - 1:
if pos == 0 and len(var) != 1:
s += var[pos] * var[pos + 1]
var.remove(j)
elif pos == 0 and len(var) == 1:
s += var[pos]
var.remove(j)
if pos == len(var) - 1 and pos != 0:
s += var[pos] * var[pos - 1]
var.remove(j)
else:
mx = max(var[pos - 1], var[pos + 1])
mn = min(var[pos - 1], var[pos + 1])
s += var[pos] * mx + mn
var.remove(j)
if s > mm:
mm = s
cs = c - 1
print(' modified list: ', var, '\n sum:', s)
print('MAX SUM was', mm, ' at', cs)
Consider 4 variants of the problem: those where every element gets consumed, and those where either the left, the right, or both the right and left elements are not consumed.
In each case, you can consider the last element to be removed, and this breaks the problem down into 1 or 2 subproblems.
This solves the problem in O(n^3) time. Here's a python program that solves the problem. The 4 variants of solve_ correspond to none, one or the other, or both of the endpoints being fixed. No doubt this program can be reduced (there's a lot of duplication).
def solve_00(seq, n, m, cache):
key = ('00', n, m)
if key in cache:
return cache[key]
assert m >= n
if n == m:
return seq[n]
best = -1e9
for i in range(n, m+1):
left = solve_01(seq, n, i, cache) if i > n else 0
right = solve_10(seq, i, m, cache) if i < m else 0
best = max(best, left + right + seq[i])
cache[key] = best
return best
def solve_01(seq, n, m, cache):
key = ('01', n, m)
if key in cache:
return cache[key]
assert m >= n + 1
if m == n + 1:
return seq[n] * seq[m]
best = -1e9
for i in range(n, m):
left = solve_01(seq, n, i, cache) if i > n else 0
right = solve_11(seq, i, m, cache) if i < m - 1 else 0
best = max(best, left + right + seq[i] * seq[m])
cache[key] = best
return best
def solve_10(seq, n, m, cache):
key = ('10', n, m)
if key in cache:
return cache[key]
assert m >= n + 1
if m == n + 1:
return seq[n] * seq[m]
best = -1e9
for i in range(n+1, m+1):
left = solve_11(seq, n, i, cache) if i > n + 1 else 0
right = solve_10(seq, i, m, cache) if i < m else 0
best = max(best, left + right + seq[n] * seq[i])
cache[key] = best
return best
def solve_11(seq, n, m, cache):
key = ('11', n, m)
if key in cache:
return cache[key]
assert m >= n + 2
if m == n + 2:
return max(seq[n] * seq[n+1] + seq[n+2], seq[n] + seq[n+1] * seq[n+2])
best = -1e9
for i in range(n + 1, m):
left = solve_11(seq, n, i, cache) if i > n + 1 else 0
right = solve_11(seq, i, m, cache) if i < m - 1 else 0
best = max(best, left + right + seq[i] * seq[n] + seq[m], left + right + seq[i] * seq[m] + seq[n])
cache[key] = best
return best
for c in [[1, 1, 1], [4, 2, 3, 5], [1, 2], [1, 2, 3], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]:
print(c, solve_00(c, 0, len(c)-1, dict()))

Using matrices to find the number of different ways to write n as the sum of 1, 3, and 4?

This is a question given in this presentation. Dynamic Programming
now i have implemented the algorithm using recursion and it works fine for small values. But when n is greater than 30 it becomes really slow.The presentation mentions that for large values of n one should consider something similar to
the matrix form of Fibonacci numbers .I am having trouble undestanding how to use the matrix form of Fibonacci numbers to come up with a solution.Can some one give me some hints or pseudocode
Thanks
Yes, you can use the technique from fast Fibonacci implementations to solve this problem in time O(log n)! Here's how to do it.
Let's go with your definition from the problem statement that 1 + 3 is counted the same as 3 + 1. Then you have the following recurrence relation:
A(0) = 1
A(1) = 1
A(2) = 1
A(3) = 2
A(k+4) = A(k) + A(k+1) + A(k+3)
The matrix trick here is to notice that
| 1 0 1 1 | |A( k )| |A(k) + A(k-2) + A(k-3)| |A(k+1)|
| 1 0 0 0 | |A(k-1)| | A( k ) | |A( k )|
| 0 1 0 0 | |A(k-2)| = | A(k-1) | = |A(k-1)|
| 0 0 1 0 | |A(k-3)| | A(k-2) | = |A(k-2)|
In other words, multiplying a vector of the last four values in the series produces a vector with those values shifted forward by one step.
Let's call that matrix there M. Then notice that
|A( k )| |A(k+2)|
|A(k-1)| |A(k+1)|
M^2 |A(k-2)| = |A( k )|
|A(k-3)| |A(k-1)|
In other words, multiplying by the square of this matrix shifts the series down two steps. More generally:
|A( k )| | A(k+n) |
|A(k-1)| |A(k-1 + n)|
M^n |A(k-2)| = |A(k-2 + n)|
|A(k-3)| |A(k-3 + n)|
So multiplying by Mn shifts the series down n steps. Now, if we want to know the value of A(n+3), we can just compute
|A(3)| |A(n+3)|
|A(2)| |A(n+2)|
M^n |A(1)| = |A(n+1)|
|A(0)| |A(n+2)|
and read off the top entry of the vector! This can be done in time O(log n) by using exponentiation by squaring. Here's some code that does just that. This uses a matrix library I cobbled together a while back:
#include "Matrix.hh"
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;
/* Naive implementations of A. */
uint64_t naiveA(int n) {
if (n == 0) return 1;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
return naiveA(n-1) + naiveA(n-3) + naiveA(n-4);
}
/* Constructs and returns the giant matrix. */
Matrix<4, 4, uint64_t> M() {
Matrix<4, 4, uint64_t> result;
fill(result.begin(), result.end(), uint64_t(0));
result[0][0] = 1;
result[0][2] = 1;
result[0][3] = 1;
result[1][0] = 1;
result[2][1] = 1;
result[3][2] = 1;
return result;
}
/* Constructs the initial vector that we multiply the matrix by. */
Vector<4, uint64_t> initVec() {
Vector<4, uint64_t> result;
result[0] = 2;
result[1] = 1;
result[2] = 1;
result[3] = 1;
return result;
}
/* O(log n) time for raising a matrix to a power. */
Matrix<4, 4, uint64_t> fastPower(const Matrix<4, 4, uint64_t>& m, int n) {
if (n == 0) return Identity<4, uint64_t>();
auto half = fastPower(m, n / 2);
if (n % 2 == 0) return half * half;
else return half * half * m;
}
/* Fast implementation of A(n) using matrix exponentiation. */
uint64_t fastA(int n) {
if (n == 0) return 1;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
auto result = fastPower(M(), n - 3) * initVec();
return result[0];
}
/* Some simple test code showing this in action! */
int main() {
for (int i = 0; i < 25; i++) {
cout << setw(2) << i << ": " << naiveA(i) << ", " << fastA(i) << endl;
}
}
Now, how would this change if 3 + 1 and 1 + 3 were treated as equivalent? This means that we can think about solving this problem in the following way:
Let A(n) be the number of ways to write n as a sum of 1s, 3s, and 4s.
Let B(n) be the number of ways to write n as a sum of 1s and 3s.
Let C(n) be the number of ways to write n as a sum of 1s.
We then have the following:
A(n) = B(n) for all n ≤ 3, since for numbers in that range the only options are to use 1s and 3s.
A(n + 4) = A(n) + B(n + 4), since your options are either (1) use a 4 or (2) not use a 4, leaving the remaining sum to use 1s and 3s.
B(n) = C(n) for all n ≤ 2, since for numbers in that range the only options are to use 1s.
B(n + 3) = B(n) + C(n + 3), sine your options are either (1) use a 3 or (2) not use a 3, leaving the remaining sum to use only 1s.
C(0) = 1, since there's only one way to write 0 as a sum of no numbers.
C(n+1) = C(n), since the only way to write something with 1s is to pull out a 1 and write the remaining number as a sum of 1s.
That's a lot to take in, but do notice the following: we ultimately care about A(n), and to evaluate it, we only need to know the values of A(n), A(n-1), A(n-2), A(n-3), B(n), B(n-1), B(n-2), B(n-3), C(n), C(n-1), C(n-2), and C(n-3).
Let's imagine, for example, that we know these twelve values for some fixed value of n. We can learn those twelve values for the next value of n as follows:
C(n+1) = C(n)
B(n+1) = B(n-2) + C(n+1) = B(n-2) + C(n)
A(n+1) = A(n-3) + B(n+1) = A(n-3) + B(n-2) + C(n)
And the remaining values then shift down.
We can formulate this as a giant matrix equation:
A( n ) A(n-1) A(n-2) A(n-3) B( n ) B(n-1) B(n-2) C( n )
| 0 0 0 1 0 0 1 1 | |A( n )| = |A(n+1)|
| 1 0 0 0 0 0 0 0 | |A(n-1)| = |A( n )|
| 0 1 0 0 0 0 0 0 | |A(n-2)| = |A(n-1)|
| 0 0 1 0 0 0 0 0 | |A(n-3)| = |A(n-2)|
| 0 0 0 0 0 0 1 1 | |B( n )| = |B(n+1)|
| 0 0 0 0 1 0 0 0 | |B(n-1)| = |B( n )|
| 0 0 0 0 0 1 0 0 | |B(n-2)| = |B(n-1)|
| 0 0 0 0 0 0 0 1 | |C( n )| = |C(n+1)|
Let's call this gigantic matrix here M. Then if we compute
|2| // A(3) = 2, since 3 = 3 or 3 = 1 + 1 + 1
|1| // A(2) = 1, since 2 = 1 + 1
|1| // A(1) = 1, since 1 = 1
M^n |1| // A(0) = 1, since 0 = (empty sum)
|2| // B(3) = 2, since 3 = 3 or 3 = 1 + 1 + 1
|1| // B(2) = 1, since 2 = 1 + 1
|1| // B(1) = 1, since 1 = 1
|1| // C(3) = 1, since 3 = 1 + 1 + 1
We'll get back a vector whose first entry is A(n+3), the number of ways to write n+3 as a sum of 1's, 3's, and 4's. (I've actually coded this up to check it - it works!) You can then use the technique for computing Fibonacci numbers using a matrix to a power efficiently that you saw with Fibonacci numbers to solve this in time O(log n).
Here's some code doing that:
#include "Matrix.hh"
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;
/* Naive implementations of A, B, and C. */
uint64_t naiveC(int n) {
return 1;
}
uint64_t naiveB(int n) {
return (n < 3? 0 : naiveB(n-3)) + naiveC(n);
}
uint64_t naiveA(int n) {
return (n < 4? 0 : naiveA(n-4)) + naiveB(n);
}
/* Constructs and returns the giant matrix. */
Matrix<8, 8, uint64_t> M() {
Matrix<8, 8, uint64_t> result;
fill(result.begin(), result.end(), uint64_t(0));
result[0][3] = 1;
result[0][6] = 1;
result[0][7] = 1;
result[1][0] = 1;
result[2][1] = 1;
result[3][2] = 1;
result[4][6] = 1;
result[4][7] = 1;
result[5][4] = 1;
result[6][5] = 1;
result[7][7] = 1;
return result;
}
/* Constructs the initial vector that we multiply the matrix by. */
Vector<8, uint64_t> initVec() {
Vector<8, uint64_t> result;
result[0] = 2;
result[1] = 1;
result[2] = 1;
result[3] = 1;
result[4] = 2;
result[5] = 1;
result[6] = 1;
result[7] = 1;
return result;
}
/* O(log n) time for raising a matrix to a power. */
Matrix<8, 8, uint64_t> fastPower(const Matrix<8, 8, uint64_t>& m, int n) {
if (n == 0) return Identity<8, uint64_t>();
auto half = fastPower(m, n / 2);
if (n % 2 == 0) return half * half;
else return half * half * m;
}
/* Fast implementation of A(n) using matrix exponentiation. */
uint64_t fastA(int n) {
if (n == 0) return 1;
if (n == 1) return 1;
if (n == 2) return 1;
if (n == 3) return 2;
auto result = fastPower(M(), n - 3) * initVec();
return result[0];
}
/* Some simple test code showing this in action! */
int main() {
for (int i = 0; i < 25; i++) {
cout << setw(2) << i << ": " << naiveA(i) << ", " << fastA(i) << endl;
}
}
This is a very interesting sequence. It is almost but not quite the order-4 Fibonacci (a.k.a. Tetranacci) numbers. Having extracted the doubling formulas for Tetranacci from its companion matrix, I could not resist doing it again for this very similar recurrence relation.
Before we get into the actual code, some definitions and a short derivation of the formulas used are in order. Define an integer sequence A such that:
A(n) := A(n-1) + A(n-3) + A(n-4)
with initial values A(0), A(1), A(2), A(3) := 1, 1, 1, 2.
For n >= 0, this is the number of integer compositions of n into parts from the set {1, 3, 4}. This is the sequence that we ultimately wish to compute.
For convenience, define a sequence T such that:
T(n) := T(n-1) + T(n-3) + T(n-4)
with initial values T(0), T(1), T(2), T(3) := 0, 0, 0, 1.
Note that A(n) and T(n) are simply shifts of each other. More precisely, A(n) = T(n+3) for all integers n. Accordingly, as elaborated by another answer, the companion matrix for both sequences is:
[0 1 0 0]
[0 0 1 0]
[0 0 0 1]
[1 1 0 1]
Call this matrix C, and let:
a, b, c, d := T(n), T(n+1), T(n+2), T(n+3)
a', b', c', d' := T(2n), T(2n+1), T(2n+2), T(2n+3)
By induction, it can easily be shown that:
[0 1 0 0]^n = [d-c-a c-b b-a a]
[0 0 1 0] [ a d-c c-b b]
[0 0 0 1] [ b b+a d-c c]
[1 1 0 1] [ c c+b b+a d]
As seen above, for any n, C^n can be fully determined from its rightmost column alone. Furthermore, multiplying C^n with its rightmost column produces the rightmost column of C^(2n):
[d-c-a c-b b-a a][a] = [a'] = [a(2d - 2c - a) + b(2c - b)]
[ a d-c c-b b][b] [b'] [ a^2 + c^2 + 2b(d - c)]
[ b b+a d-c c][c] [c'] [ b(2a + b) + c(2d - c)]
[ c c+b b+a d][d] [d'] [ b^2 + d^2 + 2c(a + b)]
Thus, if we wish to compute C^n for some n by repeated squaring, we need only perform matrix-vector multiplication per step instead of the full matrix-matrix multiplication.
Now, the implementation, in Python:
# O(n) integer additions or subtractions
def A_linearly(n):
a, b, c, d = 0, 0, 0, 1 # T(0), T(1), T(2), T(3)
if n >= 0:
for _ in range(+n):
a, b, c, d = b, c, d, a + b + d
else: # n < 0
for _ in range(-n):
a, b, c, d = d - c - a, a, b, c
return d # because A(n) = T(n+3)
# O(log n) integer multiplications, additions, subtractions.
def A_by_doubling(n):
n += 3 # because A(n) = T(n+3)
if n >= 0:
a, b, c, d = 0, 0, 0, 1 # T(0), T(1), T(2), T(3)
else: # n < 0
a, b, c, d = 1, 0, 0, 0 # T(-1), T(0), T(1), T(2)
# Unroll the final iteration to avoid computing extraneous values
for i in reversed(range(1, abs(n).bit_length())):
w = a*(2*(d - c) - a) + b*(2*c - b)
x = a*a + c*c + 2*b*(d - c)
y = b*(2*a + b) + c*(2*d - c)
z = b*b + d*d + 2*c*(a + b)
if (n >> i) & 1 == 0:
a, b, c, d = w, x, y, z
else: # (n >> i) & 1 == 1
a, b, c, d = x, y, z, w + x + z
if n & 1 == 0:
return a*(2*(d - c) - a) + b*(2*c - b) # w
else: # n & 1 == 1
return a*a + c*c + 2*b*(d - c) # x
print(all(A_linearly(n) == A_by_doubling(n) for n in range(-1000, 1001)))
Because it was rather trivial to code, the sequence is extended to negative n in the usual way. Also provided is a simple linear implementation to serve as a point of reference.
For n large enough, the logarithmic implementation above is 10-20x faster than directly exponentiating the companion matrix with numpy, by a simple (i.e. not rigorous, and likely flawed) timing comparison. And by my estimate, it would still take ~100 years to compute A(10**12)! Even though the algorithm above has room for improvement, that number is simply too large. On the other hand, computing A(10**12) mod M for some M is much more attainable.
A direct relation to Lucas and Fibonacci numbers
It turns out that T(n) is even closer to the Fibonacci and Lucas numbers than it is to Tetranacci. To see this, note that the characteristic polynomial for T(n) is x^4 - x^3 - x - 1 = 0 which factors into (x^2 - x - 1)(x^2 + 1) = 0. The first factor is the characteristic polynomial for Fibonacci & Lucas! The 4 roots of (x^2 - x - 1)(x^2 + 1) = 0 are the two Fibonacci roots, phi and psi = 1 - phi, and i and -i--the two square roots of -1.
The closed-form expression or "Binet" formula for T(n) will have the general form:
T(n) = U(n) + V(n)
U(n) = p*(phi^n) + q*(psi^n)
V(n) = r*(i^n) + s*(-i)^n
for some constant coefficients p, q, r, s.
Using the initial values for T(n), solving for the coefficients, applying some algebra, and noting that the Lucas numbers have the closed-form expression: L(n) = phi^n + psi^n, we can derive the following relations:
L(n+1) - L(n) L(n-1) F(n) + F(n-2)
U(n) = ------------- = -------- = ------------
5 5 5
where L(n) is the n'th Lucas number with L(0), L(1) := 2, 1 and F(n) is the n'th Fibonacci number with F(0), F(1) := 0, 1. And we also have:
V(n) = 1 / 5 if n = 0 (mod 4)
| -2 / 5 if n = 1 (mod 4)
| -1 / 5 if n = 2 (mod 4)
| 2 / 5 if n = 3 (mod 4)
Which is ugly, but trivial to code. Note that the numerator of V(n) can also be succinctly expressed as cos(n*pi/2) - 2sin(n*pi/2) or (3-(-1)^n) / 2 * (-1)^(n(n+1)/2), but we use the piece-wise definition for clarity.
Here's an even nicer, more direct identity:
T(n) + T(n+2) = F(n)
Essentially, we can compute T(n) (and therefore A(n)) by using Fibonacci & Lucas numbers. Theoretically, this should be much more efficient than the Tetranacci-like approach.
It is known that the Lucas numbers can computed more efficiently than Fibonacci, therefore we will compute A(n) from the Lucas numbers. The most efficient, simple Lucas number algorithm I know of is one by L.F. Johnson (see his 2010 paper: Middle and Ripple, fast simple O(lg n) algorithms for Lucas Numbers). Once we have a Lucas algorithm, we use the identity: T(n) = L(n - 1) / 5 + V(n) to compute A(n).
# O(log n) integer multiplications, additions, subtractions
def A_by_lucas(n):
n += 3 # because A(n) = T(n+3)
offset = (+1, -2, -1, +2)[n % 4]
L = lf_johnson_2010_middle(n - 1)
return (L + offset) // 5
def lf_johnson_2010_middle(n):
"-> n'th Lucas number. See [L.F. Johnson 2010a]."
#: The following Lucas identities are used:
#:
#: L(2n) = L(n)^2 - 2*(-1)^n
#: L(2n+1) = L(2n+2) - L(2n)
#: L(2n+2) = L(n+1)^2 - 2*(-1)^(n+1)
#:
#: The first and last identities are equivalent.
#: For the unrolled iteration, the following is also used:
#:
#: L(2n+1) = L(n)*L(n+1) - (-1)^n
#:
#: Since this approach uses only square multiplications per loop,
#: It turns out to be slightly faster than standard Lucas doubling,
#: which uses 1 square and 1 regular multiplication.
if n >= 0:
a, b, sign = 2, 1, +1 # L(0), L(1), (-1)^0
else: # n < 0
a, b, sign = -1, 2, -1 # L(-1), L(0), (-1)^(-1)
# unroll the last iteration to avoid computing unnecessary values
for i in reversed(range(1, abs(n).bit_length())):
a = a*a - 2*sign # L(2k)
c = b*b + 2*sign # L(2k+2)
b = c - a # L(2k+1)
sign = +1
if (n >> i) & 1:
a, b = b, c
sign = -1
if n & 1:
return a*b - sign
else:
return a*a - 2*sign
You may verify that A_by_lucas produces the same results as the previous A_by_doubling function, but is roughly 5x faster. Still not fast enough to compute A(10**12) in any reasonable amount of time!
You can easily improve your current recursion implementation by adding memoization which makes the solution fast again. C# code:
// Dictionary to store computed values
private static Dictionary<int, long> s_Solutions = new Dictionary<int, long>();
private static long Count134(int value) {
if (value == 0)
return 1;
else if (value <= 0)
return 0;
long result;
// Improvement: Do we have the value computed?
if (s_Solutions.TryGetValue(value, out result))
return result;
result = Count134(value - 4) +
Count134(value - 3) +
Count134(value - 1);
// Improvement: Store the value computed for future use
s_Solutions.Add(value, result);
return result;
}
And so you can easily call
Console.Write(Count134(500));
The outcome (which takes about 2 milliseconds) is
3350159379832610737

Something wrong with my PollardP1_rho code but I don't know how to fix it

I tried to use MillerRabin + PollardP1_rho method to factorize an integer into primes in Python3 for reducing time complexity as much as I could.But it failed some tests,I knew where the problem was.But I am a tyro in algorithm, I didn't know how to fix it.So I will put all relative codes here.
import random
def gcd(a, b):
"""
a, b: integers
returns: a positive integer, the greatest common divisor of a & b.
"""
if a == 0:
return b
if a < 0:
return gcd(-a, b)
while b > 0:
c = a % b
a, b = b, c
return a
def mod_mul(a, b, n):
# Calculate a * b % n iterately.
result = 0
while b > 0:
if (b & 1) > 0:
result = (result + a) % n
a = (a + a) % n
b = (b >> 1)
return result
def mod_exp(a, b, n):
# Calculate (a ** b) % n iterately.
result = 1
while b > 0:
if (b & 1) > 0:
result = mod_mul(result, a, n)
a = mod_mul(a, a, n)
b = (b >> 1)
return result
def MillerRabinPrimeCheck(n):
if n in {2, 3, 5, 7, 11}:
return True
elif (n == 1 or n % 2 == 0 or n % 3 == 0 or n % 5 == 0 or n % 7 == 0 or n % 11 == 0):
return False
k = 0
u = n - 1
while not (u & 1) > 0:
k += 1
u = (u >> 1)
random.seed(0)
s = 5 #If the result isn't right, then add the var s.
for i in range(s):
x = random.randint(2, n - 1)
if x % n == 0:
continue
x = mod_exp(x, u, n)
pre = x
for j in range(k):
x = mod_mul(x, x, n)
if (x == 1 and pre != 1 and pre != n - 1):
return False
pre = x
if x != 1:
return False
return True
def PollardP1_rho(n, c):
'''
Consider c as a constant integer.
'''
i = 1
k = 2
x = random.randrange(1, n - 1) + 1
y = x
while 1:
i += 1
x = (mod_mul(x, x, n) + c) % n
d = gcd(y - x, n)
if 1 < d < n:
return d
elif x == y:
return n
elif i == k:
y = x
k = (k << 1)
result = []
def PrimeFactorsListGenerator(n):
if n <= 1:
pass
elif MillerRabinPrimeCheck(n) == True:
result.append(n)
else:
a = n
while a == n:
a = PollardP1_rho(n, random.randrange(1,n - 1) + 1)
PrimeFactorsListGenerator(a)
PrimeFactorsListGenerator(n // a)
When I tried to test this:
PrimeFactorsListGenerator(4)
It didn't stop and looped this:
PollardP1_rho(4, random.randrange(1,4 - 1) + 1)
I have already tested the functions before PollardP1_rho and they work normally,so I know the function PollardP1_rho cannot deal the number 4 correctly,also the number 5.How can I fix that?
I have solved it myself.
There is 1 mistake in the code.
I should not use a var 'result' outside of the function as a global var,I should define in the function and use result.extend() to ensure the availability of the whole recursive process.So I rewrote PollardP1_rho(n, c) and PrimeFactorsListGenerator(n):
def Pollard_rho(x, c):
'''
Consider c as a constant integer.
'''
i, k = 1, 2
x0 = random.randint(0, x)
y = x0
while 1:
i += 1
x0 = (mod_mul(x0, x0, x) + c) % x
d = gcd(y - x0, x)
if d != 1 and d != x:
return d
if y == x0:
return x
if i == k:
y = x0
k += k
def PrimeFactorsListGenerator(n):
result = []
if n <= 1:
return None
if MillerRabinPrimeCheck(n):
return [n]
p = n
while p >= n:
p = Pollard_rho(p, random.randint(1, n - 1))
result.extend(PrimeFactorsListGenerator(p))
result.extend(PrimeFactorsListGenerator(n // p))
return result
#PrimeFactorsListGenerator(400)
#PrimeFactorsListGenerator(40000)
There is an additional tip: You don't need to write a function mod_mul(a, b, n) at all, using Python built-in pow(a, b, n) will do the trick and it is fully optimized.

Finding the continued fraction of 2^(1/3) to very high precision

Here I'll use the notation
It is possible to find the continued fraction of a number by computing it then applying the definition, but that requires at least O(n) bits of memory to find a0, a1 ... an, in practice it is a much worse. Using double floating point precision it is only possible to find a0, a1 ... a19.
An alternative is to use the fact that if a,b,c are rational numbers then there exist unique rationals p,q,r such that 1/(a+b*21/3+c*22/3) = x+y*21/3+z*22/3, namely
So if I represent x,y, and z to absolute precision using the boost rational lib I can obtain floor(x + y*21/3+z*22/3) accurately only using double precision for 21/3 and 22/3 because I only need it to be within 1/2 of the true value. Unfortunately the numerators and denominators of x,y, and z grow considerably fast, and if you use regular floats instead the errors pile up quickly.
This way I was able to compute a0, a1 ... a10000 in under an hour, but somehow mathematica can do that in 2 seconds. Here's my code for reference
#include <iostream>
#include <boost/multiprecision/cpp_int.hpp>
namespace mp = boost::multiprecision;
int main()
{
const double t_1 = 1.259921049894873164767210607278228350570251;
const double t_2 = 1.587401051968199474751705639272308260391493;
mp::cpp_rational p = 0;
mp::cpp_rational q = 1;
mp::cpp_rational r = 0;
for(unsigned int i = 1; i != 10001; ++i) {
double p_f = static_cast<double>(p);
double q_f = static_cast<double>(q);
double r_f = static_cast<double>(r);
uint64_t floor = p_f + t_1 * q_f + t_2 * r_f;
std::cout << floor << ", ";
p -= floor;
//std::cout << floor << " " << p << " " << q << " " << r << std::endl;
mp::cpp_rational den = (p * p * p + 2 * q * q * q +
4 * r * r * r - 6 * p * q * r);
mp::cpp_rational a = (p * p - 2 * q * r) / den;
mp::cpp_rational b = (2 * r * r - p * q) / den;
mp::cpp_rational c = (q * q - p * r) / den;
p = a;
q = b;
r = c;
}
return 0;
}
The Lagrange algorithm
The algorithm is described for example in Knuth's book The Art of Computer Programming, vol 2 (Ex 13 in section 4.5.3 Analysis of Euclid's Algorithm, p. 375 in 3rd edition).
Let f be a polynomial of integer coefficients whose only real root is an irrational number x0 > 1. Then the Lagrange algorithm calculates the consecutive quotients of the continued fraction of x0.
I implemented it in python
def cf(a, N=10):
"""
a : list - coefficients of the polynomial,
i.e. f(x) = a[0] + a[1]*x + ... + a[n]*x^n
N : number of quotients to output
"""
# Degree of the polynomial
n = len(a) - 1
# List of consecutive quotients
ans = []
def shift_poly():
"""
Replaces plynomial f(x) with f(x+1) (shifts its graph to the left).
"""
for k in range(n):
for j in range(n - 1, k - 1, -1):
a[j] += a[j+1]
for _ in range(N):
quotient = 1
shift_poly()
# While the root is >1 shift it left
while sum(a) < 0:
quotient += 1
shift_poly()
# Otherwise, we have the next quotient
ans.append(quotient)
# Replace polynomial f(x) with -x^n * f(1/x)
a.reverse()
a = [-x for x in a]
return ans
It takes about 1s on my computer to run cf([-2, 0, 0, 1], 10000). (The coefficients correspond to the polynomial x^3 - 2 whose only real root is 2^(1/3).) The output agrees with the one from Wolfram Alpha.
Caveat
The coefficients of the polynomials evaluated inside the function quickly become quite large integers. So this approach needs some bigint implementation in other languages (Pure python3 deals with it, but for example numpy doesn't.)
You might have more luck computing 2^(1/3) to high accuracy and then trying to derive the continued fraction from that, using interval arithmetic to determine if the accuracy is sufficient.
Here's my stab at this in Python, using Halley iteration to compute 2^(1/3) in fixed point. The dead code is an attempt to compute fixed-point reciprocals more efficiently than Python via Newton iteration -- no dice.
Timing from my machine is about thirty seconds, spent mostly trying to extract the continued fraction from the fixed point representation.
prec = 40000
a = 1 << (3 * prec + 1)
two_a = a << 1
x = 5 << (prec - 2)
while True:
x_cubed = x * x * x
two_x_cubed = x_cubed << 1
x_prime = x * (x_cubed + two_a) // (two_x_cubed + a)
if -1 <= x_prime - x <= 1: break
x = x_prime
cf = []
four_to_the_prec = 1 << (2 * prec)
for i in range(10000):
q = x >> prec
r = x - (q << prec)
cf.append(q)
if True:
x = four_to_the_prec // r
else:
x = 1 << (2 * prec - r.bit_length())
while True:
delta_x = (x * ((four_to_the_prec - r * x) >> prec)) >> prec
if not delta_x: break
x += delta_x
print(cf)

Resources