Rabin Karp algorithm for big strings - algorithm

I wrote a simple step-by-step implementation of Rabin-Karp algorithm for substring search, and it seems to work fine until the hash becomes greater than the modulus, and then it goes wrong...
Here is the code, it's quite simple:
typedef long long ll;
#define B 257
//base
#define M 2147483647
//modulus
//modulus for positive and negative values
ll mod(ll a){
return (a % M + M) % M;
}
//fast way to calculate modular power
ll power(ll n, ll e){
ll r = 1;
for(; e > 0; e >>= 1, n = (n*n) % M)
if(e&1) r = (r * n) % M;
return r;
}
//function to calculate de initial hash
//H(s) = s[0] * B^0 + s[1] * B^1 + ...
ll H(char sub[], int s){
ll h = 0;
for(ll i = 0; i < s; i++)
h = mod(h + mod(power(B, i) * sub[i]));
return h;
}
//brute force comparing when hashes match
bool check(char text[], char sub[], int ini, int s){
int i = 0;
while(text[ini + i] == sub[i] && i < s) i++;
return i == s;
}
//all together here
void RabinKarp(char text[], char sub[]){
int t = strlen(text), s = strlen(sub);
ll hs = H(sub, s), ht = H(text, s);
int lim = t - s;
for(int i = 0; i <= lim; i++){
if(ht == hs)
if(check(text, sub, i, s))
printf("MATCH AT %d\n", i);
ht -= text[i];
ht /= B;
ht = mod(ht + power(B, s - 1) * text[i + s]);
//we had text[i] * B^0 + text[i+1] * B^1 + ... + text[i + len - 1] * B^(len-1)
//then text[i+1] * B^1 + text[i+2] * B^2 + ... + text[i + len - 1] * B^(len-1)
//then text[i+1] * B^0 + text[i+2] * B^1 + ... + text[i + len - 1] * B^(len-2)
//finally we add a new last term text[i + len] * B^(len-1)
//so we moved the hash to the next position
}
}
int main(){
char text[] = "uvauvauvaaauva";
char sub[] = "uva";
char sub2[] = "uvauva";
RabinKarp(text, sub);
printf("----------------------------\n");
RabinKarp(text, sub2);
}
The problem is that after I take the modulus, the hash can become a small number and then, when I add some big factor to it, the hashes may not match even when they should.
For example: abc inside xabc
when I take the hash of abc and xab, suppose both of them are bigger than the modulus, so they get small after the modulus operation.
Then, when I remove 'x' and add the 'c' factor, the sum can be smaller than the modulus but still big, so it won't match.
How can I overcome this problem?

ht /= B;
is not plausible. First of all because you are doing arithmetic mod M, and the modular equivalent of division is not the same as the standard one. Secondly because you should expect the same answer for x and x + M and this will not be the case.
You have text[i] * B^0 + text[i+1] * B^1 + ... + text[i + len - 1] * B^(len-1)
If you work with
text[i] * B^(len-1) + text[i+1] * B^(len - 2) + ... + text[i + len - 1] * B^0
You can subtract off text[i] * B^(len-1) and then multiply by B instead

Related

Multipliers (codeforces)

This is the link to this algorithm topic: https://codeforces.com/problemset/problem/615/D
my code time limit exceeded on test40, I thought for a long time but no good way, is there a good optimization method, may be ?
mycode:
typedef long long ll;
ll mod = 1e9 + 7;
ll fast_mod(ll a, ll n, ll Mod)
{
ll ans=1;
a%=Mod;
while(n)
{
if(n&1) ans=(ans*a)%Mod;
a=(a*a)%Mod;
n>>=1;
}
return ans;
}
int main()
{
std::ios::sync_with_stdio(false);
std::cin.tie(0); // IO
ll m;
cin >> m;
ll num = 1ll;
map<ll, ll> count;
for(int i = 0; i < m; i++)
{
ll p;
cin >> p;
count[p]++;
}
ll res = 1ll;
vector<ll> a;
vector<ll> b;
for(auto it = count.begin(); it != count.end(); it++)
{
a.push_back(it -> first);
b.push_back(it -> second);
}
for(int i = 0; i < a.size(); i++)
{
ll x = a[i]; // a kind of prime
ll y = b[i]; // the count of the prime
ll tmp = fast_mod(x, y * (y + 1) / 2, mod); // x^1 * x^2 * x^3 *...*x^y
for(int j = 0; j < b.size(); j++) // calculate ( tmp)^((b[0] + 1)*(b[1] + 1)*...*(b[b.size() - 1] + 1)), here b.size() is the number of different primes
tmp = fast_mod(tmp, i != j ? (b[j] + 1) : 1, mod) % mod;
res = (res * tmp % mod);
}
cout << res << endl;
return 0;
}
Find the number of each different prime number, suppose x is one of the different prime number, then calculate x^1x^2...x^y, y is the count of x, the result as tmp.Then the product of count of
other prime plus one as the exponent: (b[0] + 1)(b[1] +1)...(b[b.size() - 1] + 1), tmp as base.
The for loop divide the calculation into several steps.
Last, res * (tmp^ ((b[0] + 1)(b[1] +1)...*(b[b.size() - 1] + 1)))
An other formula for the product of the divisors of N is N ** (D/ 2), where D is the number of divisors and may be found from your map count by taking the product of entry->second + 1 for every entry.
This does raise the question of what to do when D is odd, which it would be if N is a perfect square. In that case it is easy to compute sqrt(N) (the exponents would all be even, so you can halve them all and take the product of the primes to half of their original exponents), and then raise sqrt(N) to the power of D. Essentially this changes N ** (D / 2) into (N ** (1 / 2)) ** D.
For example if N = 2 * 3 * 2 = 12 (one of the examples), then D will be (2 + 1) * (1 + 1) = 6 and the product of divisors will be 12 ** (6 / 2) = 1728.
Computing N (or its square root) should done modulo mod. Computing D should be done modulo mod - 1 (the totient of mod, mod is a prime so its totient is just one less). mod - 1 is even, so we could not have computed the modular multiplicative inverse of 2 to "divide" D by 2 that way. When N is a square then AFAIK we're really stuck with computing its square root (that's not so bad, but multiplying by a half would have been easier).

C++ algorithm code for Magical sequence that will generate desired output

The Magical Sequence
A Magical Sequence is defined as shown.
Magical[1] = 0
Magical[2] = 1
Magical[n] = Magical[n-1] + 2*Magical[n-2] + 3*Magical[n-3] + ... (n-1)*Magical[1] + n*1., for n > 2
Given n (1 <= n <= 10^9 ), find Magical[n].
Example 1: input: 3
Output: 4
Explanation:
Magical[n] = 1*Magical[n-1] + 2*Magical[n-2] + 3*1
Magical[3] = 1*Magical[2] + 2*Magical[1] + 3*1
Magical[3] = 1*1 + 2*0 + 3*1
Magical[3] = 4
Example 2: input: 4
Output: 10
Magical[4] = 1*Magical[3]+2*Magical[2]+3*Magical[1]+4*1
= 1*4+2*1+3*0+4 = 10
Example 3: input: 5
Output: 26
Magical[5] = 1*Magical[4]+2*Magical[3]+3*Magical[2]+4*Magical[1]+5*1
= 1*10+2*4+3*1+4*0+5 = 26
I tried something like below :-
int CuckooNum(int n)
{
if (1 == n)
{
return 0;
}
else if (2 == n)
{
return 1;
}
std::vector<int> vec;
vec.resize(n);
vec[0] = 4;
vec[1] = 0;
vec[2] = 1;
int multiplyer = n;
int result = 0;
for (int index=3; index <= n; index++)
{
result += multiplyer * vec[index-1];
vec[index] = result;
multiplyer--;
}
return result;
}
long long func(int n)
{
if (n==1) return 0;
else if (n==2) return 1;
else return 1*func(n-1)+2*func(n-2)+n;
}
As the size n can be very large (10^9), a direct implementation O(n^2) is not possible.
A specific algorithm is needed. I will focus here on the algorithm, and propose a O(log n) solution.
To simplify explanation, I rename magical[] as x[]
Moreover, we can define x[0] = 1. Then,
x[n] = x[n-1] + 2*x[n-2] + 3*x[n-3] + ... (n-1)*x[1] + n*x[0]
As
x[n-1] = 1*x[n-2] + 2*x[n-3] + ... (n-2)*x[1] + (n-1)*x[0]
It follows
x[n] - x[n-1] = x[n-1] + x[n-2] + x[n-3] + ... x[1] + x[0] = S[n-1]
When S[n] represents the sum of the terms until n (x[0] included)
Moreover,
S[n] = S[n-1] + x[n] = 2*S[n-1] + x[n-1]
Therefore, the iterative formula can be represented in a simple matrix form:
(x[n]) = (1 1) (x[n-1])
(S[n]) (1 2) (S[n-1])
Or, defining the vector (x[n] S[n])^t as Z[n]:
Z[n] = A * Z[n-1] where A is the matrix (1 1)
(1 2)
Note: this formula is valid for n>= 4 only, as the first x[n] values do no respect the simple recurrence relation.
It follows that
Z[n] = A^(n-3) Z[3] with Z[3] = (4 6)^t
Classically, this calculation can be performed with O(log n) complexity, iteratively calculating A^2, A^4, A^8 etc.
Pay attention that the values increase rapidly.
Here is an example of C++ implementation. Note that this implementation is not optimized, as for example it doesn't use the fact that all matrices are symmetric.
#include <iostream>
#include <array>
using Matr22 = std::array<std::array<long long int, 2>, 2>;
using Vect2 = std::array<long long int, 2>;
Matr22 Matrsquare (const Matr22 &m) {
Matr22 m2;
m2[0][0] = m[0][0]*m[0][0] + m[0][1]*m[1][0];
m2[0][1] = m[0][0]*m[0][1] + m[0][1]*m[1][1];
m2[1][0] = m[1][0]*m[0][0] + m[1][1]*m[1][0];
m2[1][1] = m[1][0]*m[0][1] + m[1][1]*m[1][1];
return m2;
}
Matr22 Mult (const Matr22 &m1, const Matr22 &m2) {
Matr22 y;
y[0][0] = m1[0][0]*m2[0][0] + m1[0][1]*m2[1][0];
y[0][1] = m1[0][0]*m2[0][1] + m1[0][1]*m2[1][1];
y[1][0] = m1[1][0]*m2[0][0] + m1[1][1]*m2[1][0];
y[1][1] = m1[1][0]*m2[0][1] + m1[1][1]*m2[1][1];
return y;
}
Vect2 Mult (const Matr22 &m, const Vect2& x) {
Vect2 y;
y[0] = m[0][0] * x[0] + m[0][1] * x[1];
y[1] = m[1][0] * x[0] + m[1][1] * x[1];
return y;
}
// Matrix exponentiation
Matr22 Mult_exp (const Matr22 &m, int exp) {
Matr22 y = {1, 0, 0, 1};
if (exp == 0) return y;
Matr22 M2k = m;
while (exp) {
if (exp%2) y = Mult (y, M2k);
M2k = Matrsquare (M2k);
exp /= 2;
};
return y;
}
long long int Magical (int n) {
if (n == 1) return 0;
if (n == 2) return 1;
if (n == 3) return 4;
Matr22 A = {1, 1, 1, 2};
Vect2 z = {4, 6}; // corresponds to n=3
auto Ak = Mult_exp (A, n-3);
z = Mult (Ak, z);
return z[0];
}
int main() {
int n;
std::cout << "Input n: ";
std::cin >> n;
auto ans = Magical (n);
std::cout << "Magical[" << n << "] = " << ans << '\n';
}

Finding sum of geometric sequence with modulo 10^9+7 with my program

The problem is given as:
Output the answer of (A^1+A^2+A^3+...+A^K) modulo 1,000,000,007, where 1≤ A, K ≤ 10^9, and A and K must be an integer.
I am trying to write a program to compute the above question. I have tried using the formula for geometric sequence, then applying the modulo on the answer. Since the results must be an integer as well, finding modulo inverse is not required.
Below is the code I have now, its in pascal
Var
a,k,i:longint;
power,sum: int64;
Begin
Readln(a,k);
power := 1;
For i := 1 to k do
power := ((power mod 1000000007) * a) mod 1000000007;
sum := a * (power-1) div (a-1);
Writeln(sum mod 1000000007);
End.
This task came from my school, they do not give away their test data to the students. Hence I do not know why or where my program is wrong. I only know that my program outputs the wrong answer for their test data.
If you want to do this without calculating a modular inverse, you can calculate it recursively using:
1+ A + A2 + A3 + ... + Ak
= 1 + (A + A2)(1 + A2 + (A2)2 + ... + (A2)k/2-1)
That's for even k. For odd k:
1+ A + A2 + A3 + ... + Ak
= (1 + A)(1 + A2 + (A2)2 + ... + (A2)(k-1)/2)
Since k is divided by 2 in each recursive call, the resulting algorithm has O(log k) complexity. In java:
static int modSumAtoAk(int A, int k, int mod)
{
return (modSum1ToAk(A, k, mod) + mod-1) % mod;
}
static int modSum1ToAk(int A, int k, int mod)
{
long sum;
if (k < 5) {
//k is small -- just iterate
sum = 0;
long x = 1;
for (int i=0; i<=k; ++i) {
sum = (sum+x) % mod;
x = (x*A) % mod;
}
return (int)sum;
}
//k is big
int A2 = (int)( ((long)A)*A % mod );
if ((k%2)==0) {
// k even
sum = modSum1ToAk(A2, (k/2)-1, mod);
sum = (sum + sum*A) % mod;
sum = ((sum * A) + 1) % mod;
} else {
// k odd
sum = modSum1ToAk(A2, (k-1)/2, mod);
sum = (sum + sum*A) % mod;
}
return (int)sum;
}
Note that I've been very careful to make sure that each product is done in 64 bits, and to reduce by the modulus after each one.
With a little math, the above can be converted to an iterative version that doesn't require any storage:
static int modSumAtoAk(int A, int k, int mod)
{
// first, we calculate the sum of all 1... A^k
// we'll refer to that as SUM1 in comments below
long fac=1;
long add=0;
//INVARIANT: SUM1 = add + fac*(sum 1...A^k)
//this will remain true as we change k
while (k > 0) {
//above INVARIANT is true here, too
long newmul, newadd;
if ((k%2)==0) {
//k is even. sum 1...A^k = 1+A*(sum 1...A^(k-1))
newmul = A;
newadd = 1;
k-=1;
} else {
//k is odd.
newmul = A+1L;
newadd = 0;
A = (int)(((long)A) * A % mod);
k = (k-1)/2;
}
//SUM1 = add + fac * (newadd + newmul*(sum 1...Ak))
// = add+fac*newadd + fac*newmul*(sum 1...Ak)
add = (add+fac*newadd) % mod;
fac = (fac*newmul) % mod;
//INVARIANT is restored
}
// k == 0
long sum1 = fac + add;
return (int)((sum1 + mod -1) % mod);
}

Iteration n * F(n - 1)+((n - 1) * F(n - 2))

I am stuck with this: n * F(n - 1)+((n - 1) * F(n - 2)), I know how to write this recursively. But no idea about the iteration.
I use this for recursion:
long F_r(int n)
{
if (n <= 2)
{
return 1;
}
else if (n > 2)
{
return n * F_r(n - 1) + ((n - 1) * F_r(n - 2));
}
}
Can someone help me, please?
To understand the iteration just simulate for n = 3 or some other values (greater than 3 will help better). Let's start with n = 0, 1, 2, 3, 4, ... and see how the values of F gets calculated:
F(0) = 1;
F(1) = 1;
F(2) = 1;
F(3) = 3* F(2) + (2* F(1));
= 3*1 + (2*1);
= 3 + 2;
= 5;
F(4) = 4* F(3) + (3* F(2));
= 4*5 + (3*1);
= 20 + 3;
= 23;
And so on.
With an array for storing all intermediate values of F:
long F_r(int n)
{
long[] f = new long [n + 1]; // f[0] is not used
f[1] = 1;
f[2] = 1;
for (int i = 3; i <= n; i++)
{
f[i] = i * f[i - 1] + ((i - 1) * f[i - 2]); // the formula goes here
}
return f[n];
}
If you want to use only O(1) space, note that you don't need to store the whole array, only the previous two values at each point of time.
So, this can be rewritten as in fgb's answer.
To write it as an iterative algorithm, you can write something in the form of:
long F(int n) {
long a = 1;
long b = 1;
long c = 1;
for(int x = 3; x <= n; x++) {
a = b;
b = c;
c = ...
}
return c;
}
Just for fun -- solving the recurrence relation with Wolfram Alpha, we get:
F(n) = (2 * factorial(n + 2) - 5 * subfactorial(n + 2)) / (n + 1)
Which we can calculate as:
long F(int n) {
long p = 1;
long q = 1;
for (int i = 1; i <= n + 2; i++) {
p *= i;
q = q * i + (1 - (i % 2) * 2);
}
return (2 * p - 5 * q) / (n + 1);
}

OpenCL Cholesky Decomposition

I implemented the following Cholesky decomposition algorithm using OpenCL. The code is exhibiting random behavior. It matches the cpu output only some times. Can someone please help me to figure out what is wrong with my implementation.
Here is the algorithm:
procedure CHOLESKY(A)
int i, j, k;
for k := 0 to n − 1 do /* 1st loop */
/* Obtain the square root of the diagonal element. */
A[k, k] := A[k, k];
for j := k + 1 to n − 1 do /* 2nd loop */
/* The division step. */
A[k, j] := A[k, j]/A[k, k];
end for
for i := k + 1 to n − 1 do /* 3rd loop */
for j := i to n − 1 do /* 4th loop */
/* The elimination step. */
A[i, j] := A[i, j] - A[k, i] × A[k, j];
end for
end for
end for
Methodology to parallelize the above algorithm:
From the algorithm, the elimination step is the most expensive. So I have the outermost loop
in the host code, and I call the kernel within the loop. A single run of the kernel basically
corresponds to a single iteration of the 3rd loop. Therefore, I launch (n-1 )- (k+1) + 1 work groups. The number of work items within a workgroup is set to n/2. The 2nd for loop is also computed within the kernel, but I allow only the first workgroup to do it.
RELEVANT HOST CODE
// for a 10 X 10 matrix, MATRIX_SIZE = 10
localWorkSize[0] = MATRIX_SIZE/2;
stride = MATRIX_SIZE/2;
cl_event event;
for(k = 0; k < MATRIX_SIZE; k++)
{
int isize = (MATRIX_SIZE-1) - (k+1) + 1;
int num_blocks = isize;
if(num_blocks <= 0)
num_blocks = 1;
globalWorkSize[0] = num_blocks * WA/2;
errcode = clSetKernelArg(clKernel, 0, sizeof(int), (void *)&k);
errcode |= clSetKernelArg(clKernel, 1, sizeof(cl_mem), (void *)&d_A);
errcode |= clSetKernelArg(clKernel, 2, sizeof(int), (void *)&stride);
errcode = clEnqueueNDRangeKernel(clCommandQueue,
clKernel, 1, NULL, globalWorkSize,
localWorkSize, 0, NULL, &event);
OpenCL_CheckError(errcode, "clEnqueueNDRangeKernel");
clFinish(clCommandQueue);
}
KERNEL CODE
__kernel void
batchedCholesky(__global float *U, int k, int stride)
{
int tx = get_global_id(0);
unsigned int j;
unsigned int num_rows = MATRIX_SIZE;
if(tx==0)
{
// Take the square root of the diagonal element
U[k * num_rows + k] = sqrt(U[k * num_rows + k]);
}
barrier(CLK_GLOBAL_MEM_FENCE);
int offset = (k+1); //From original loop
int jstart = get_local_id(0) + offset;
int jstep = stride;
int jtop = num_rows - 1;
int jbottom = (k + 1);
//Do work for this i iteration
//Division step
if(get_group_id(0) == 0)
{
for(j = jstart; (j >= jbottom) && (j <= jtop); j+=jstep)
{
U[k * num_rows + j] /= U[k * num_rows + k]; // Division step
}
}
barrier(CLK_GLOBAL_MEM_FENCE);
j = 0;
int i = get_group_id(0) + (k+1);
offset = i;
jstart = get_local_id(0) + offset;
jbottom = i;
for( j = jstart; j >= jbottom && j <= jtop; j += jstep)
U[i * num_rows + j] -= U[k * num_rows + i] * U[k * num_rows + j];
barrier(CLK_GLOBAL_MEM_FENCE);
}
Not all of your work items execute at the same time, they may run in batches. So your code running prior to CLK_GLOBAL_MEM_FENCE won't include every value. That may be the source of your errors.
If you require global synchronization, use multiple kernels.

Resources