Fast Iterative GCD

Fast Iterative GCD - algorithm

I have GCD(n, i) where i=1 is increasing in loop by 1 up to n. Is there any algorithm which calculate all GCD's faster than naive increasing and compute GCD using Euclidean algorithm?
PS I've noticed if n is prime I can assume that number from 1 to n-1 would give 1, because prime number would be co-prime to them. Any ideas for other numbers than prime?

C++ implementation, works in O(n * log log n) (assuming size of integers are O(1)):
#include <cstdio>
#include <cstring>
using namespace std;
void find_gcd(int n, int *gcd) {
// divisor[x] - any prime divisor of x
// or 0 if x == 1 or x is prime
int *divisor = new int[n + 1];
memset(divisor, 0, (n + 1) * sizeof(int));
// This is almost copypaste of sieve of Eratosthenes, but instead of
// just marking number as 'non-prime' we remeber its divisor.
// O(n * log log n)
for (int x = 2; x * x <= n; ++x) {
if (divisor[x] == 0) {
for (int y = x * x; y <= n; y += x) {
divisor[y] = x;
}
}
}
for (int x = 1; x <= n; ++x) {
if (n % x == 0) gcd[x] = x;
else if (divisor[x] == 0) gcd[x] = 1; // x is prime, and does not divide n (previous line)
else {
int a = x / divisor[x], p = divisor[x]; // x == a * p
// gcd(a * p, n) = gcd(a, n) * gcd(p, n / gcd(a, n))
// gcd(p, n / gcd(a, n)) == 1 or p
gcd[x] = gcd[a];
if ((n / gcd[a]) % p == 0) gcd[x] *= p;
}
}
}
int main() {
int n;
scanf("%d", &n);
int *gcd = new int[n + 1];
find_gcd(n, gcd);
for (int x = 1; x <= n; ++x) {
printf("%d:\t%d\n", x, gcd[x]);
}
return 0;
}

SUMMARY
The possible answers for the gcd consist of the factors of n.
You can compute these efficiently as follows.
ALGORITHM
First factorise n into a product of prime factors, i.e. n=p1^n1*p2^n2*..*pk^nk.
Then you can loop over all factors of n and for each factor of n set the contents of the GCD array at that position to the factor.
If you make sure that the factors are done in a sensible order (e.g. sorted) you should find that the array entries that are written multiple times will end up being written with the highest value (which will be the gcd).
CODE
Here is some Python code to do this for the number 1400=2^3*5^2*7:
prime_factors=[2,5,7]
prime_counts=[3,2,1]
N=1
for prime,count in zip(prime_factors,prime_counts):
N *= prime**count
GCD = [0]*(N+1)
GCD[0] = N
def go(i,n):
"""Try all counts for prime[i]"""
if i==len(prime_factors):
for x in xrange(n,N+1,n):
GCD[x]=n
return
n2=n
for c in xrange(prime_counts[i]+1):
go(i+1,n2)
n2*=prime_factors[i]
go(0,1)
print N,GCD

Binary GCD algorithm:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm
is faster than Euclidean algorithm:
https://en.wikipedia.org/wiki/Euclidean_algorithm
I implemented "gcd()" in C for type "__uint128_t" (with gcc on Intel i7 Ubuntu), based on iterative Rust version:
https://en.wikipedia.org/wiki/Binary_GCD_algorithm#Iterative_version_in_Rust
Determining number of trailing 0s was done efficiently with "__builtin_ctzll()". I did benchmark 1 million loops of two biggest 128bit Fibonacci numbers (they result in maximal number of iterations) against gmplib "mpz_gcd()" and saw 10% slowdown. Utilizing the fact that u/v values only decrease, I switched to 64bit special case "_gcd()" when "<=UINT64_max" and now see speedup of 1.31 over gmplib, for details see:
https://www.raspberrypi.org/forums/viewtopic.php?f=33&t=311893&p=1873552#p1873552
inline int ctz(__uint128_t u)
{
unsigned long long h = u;
return (h!=0) ? __builtin_ctzll( h )
: 64 + __builtin_ctzll( u>>64 );
}
unsigned long long _gcd(unsigned long long u, unsigned long long v)
{
for(;;) {
if (u > v) { unsigned long long a=u; u=v; v=a; }
v -= u;
if (v == 0) return u;
v >>= __builtin_ctzll(v);
}
}
__uint128_t gcd(__uint128_t u, __uint128_t v)
{
if (u == 0) { return v; }
else if (v == 0) { return u; }
int i = ctz(u); u >>= i;
int j = ctz(v); v >>= j;
int k = (i < j) ? i : j;
for(;;) {
if (u > v) { __uint128_t a=u; u=v; v=a; }
if (v <= UINT64_MAX) return _gcd(u, v) << k;
v -= u;
if (v == 0) return u << k;
v >>= ctz(v);
}
}

Related

Convert a number m to n using minimum number of given operations

Question:
Given 2 integers N and M. Convert a number N to M using minimum number of given operations.
The operations are:
Square N (N = N^2)
Divide N by a prime integer P if N is divisible by P (N = N / P and N % P == 0)
Contrants:
N, M <= 10^9
Example:
N = 12, M = 18
The minimum operations are:
N /= 2 -> N = 6
N = N^2 -> N = 36
N /= 2 -> N = 18
My take:
I'm trying to use BFS to solve this problem. For each number, the available edges to other numberers are the operations. But it got Time Limit Exceeded. Is there any better way to solve this?
Here is my BFS code:
queue<pair<int,int> > q;
vector<long long> pr;
ll m,n;
bool prime[MAXN+1];
void solve()
{
while (!q.empty())
{
pii x=q.front();
q.pop();
if (x.first==m)
{
cout << x.second;
return;
}
if (x.first==1) continue;
for(ll k:pr)
{
if (k>x.first) break;
if (x.first%k==0) q.push({x.first/k,x.second+1});
}
q.push({x.first*x.first,x.second+1});
}
}

The algorithm uses the decomposition on N and M in prime factors, keeping trace of the corresponding exponents.
If M has a prime factor that N does not have, there is no solution (the code returns -1).
If N has some prime factors that M doesn't have, then the first step is to divide N by these primes.
The corresponding number of operations is the sum of the corresponding exponents.
At this stage, we get two arrays A and B corresponding to the exponents of the common prime factors, for N and M.
It is worth noting that at this stage, the values of the primes involved is not relevant anymore, only the exponents matter.
Then one must determine the minimum number of squares (= multiplications by 2 of the exponents).
The is the smallest k such that A[i] >= 2^k B[i] for all indices i.
The number of multiplications is added to the number of operations only once, as all exponents are multiplied by 2 at the same time.
Last step is to determine, for each pair (a, b) = (A[i], B[i]), the number of subtractions needed to go from a to b, while implementing exactly k multiplications by 2. This is performed with the following rules:
- if (k == 0) f(a, b, k) = a-b
- Else:
- if ((a-1)*2^k >= b: f(a, b, k) = 1 + f(a-1, b, k)
- else: f(a, b, k) = f(2*a, b, k-1)
The complexity is dominated by the decomposition in primes factors: O(sqrt(n))
Code:
This code is rather long, but a great part consists if helper routines needed for debugging and analysis.
#include <iostream>
#include <vector>
#include <cmath>
#include <algorithm>
void print (const std::vector<int> &v, const std::string s = "") {
std::cout << s;
for (auto &x: v) {
std::cout << x << " ";
}
std::cout << std::endl;
}
void print_decomp (int n, const std::vector<int> &primes, const std::vector<int> &mult) {
std::cout << n << " = ";
int k = primes.size();
for (int i = 0; i < k; ++i) {
std::cout << primes[i];
if (mult[i] > 1) std::cout << "^" << mult[i];
std::cout << " ";
}
std::cout << "\n";
}
void prime_decomp (int nn, std::vector<int> &primes, std::vector<int> &mult) {
int n = nn;
if (n <= 1) return;
if (n % 2 == 0) {
primes.push_back(2);
int cpt = 1;
n/= 2;
while (n%2 == 0) {n /= 2; cpt++;}
mult.push_back (cpt);
}
int max_prime = sqrt(n);
int p = 3;
while (p <= max_prime) {
if (n % p == 0) {
primes.push_back(p);
int cpt = 1;
n/= p;
while (n%p == 0) {n /= p; cpt++;}
mult.push_back (cpt);
max_prime = sqrt(n);
}
p += 2;
}
if (n != 1) {
primes.push_back(n);
mult.push_back (1);
}
print_decomp (nn, primes, mult);
}
// Determine the number of subtractions to go from a to b, with exactly k multiplications by 2
int n_sub (int a, int b, int k, int power2) {
if (k == 0){
if (b > a) exit(1);
return a - b;
}
//if (a == 1) return n_sub (2*a, b, k-1, power2/2);
if ((a-1)*power2 >= b) {
return 1 + n_sub(a-1, b, k, power2);
} else {
return n_sub (2*a, b, k-1, power2/2);
}
return 0;
}
// A return of -1 means no possibility
int n_operations (int N, int M) {
int count = 0;
if (N == M) return 0;
if (N == 1) return -1;
std::vector<int> primes_N, primes_M, expon_N, expon_M;
// Prime decomposition
prime_decomp(N, primes_N, expon_N);
prime_decomp (M, primes_M, expon_M);
// Compare decomposition, check if a solution can exist, set up two exponent arrays
std::vector<int> A, B;
int index_A = 0, index_B = 0;
int nA = primes_N.size();
int nB = primes_M.size();
while (true) {
if ((index_A == nA) && (index_B == nB)) {
break;
}
if ((index_A < nA) && (index_B < nB)) {
if (primes_N[index_A] == primes_M[index_B]) {
A.push_back(expon_N[index_A]);
B.push_back(expon_M[index_B]);
index_A++; index_B++;
continue;
}
if (primes_N[index_A] < primes_M[index_B]) {
count += expon_N[index_A];
index_A++;
continue;
}
return -1; // M has a prime that N doesn't have: impossibility to go to M
}
if (index_B != nB) { // impossibility
return -1;
}
for (int i = index_A; i < nA; ++i) {
count += expon_N[i]; // suppression of primes in N not in M
}
break;
}
std::cout << "1st step, count = " << count << "\n";
print (A, "exponents of N: ");
print (B, "exponents of M: ");
// Determination of the number of multiplications by two of the exponents (= number of squares)
int n = A.size();
int n_mult2 = 0;
int power2 = 1;
for (int i = 0; i < n; ++i) {
while (power2*A[i] < B[i]) {
power2 *= 2;
n_mult2++;
}
}
count += n_mult2;
std::cout << "number of squares = " << n_mult2 << " -> " << power2 << "\n";
// For each pair of exponent, determine the number of subtractions,
// with a fixed number of multiplication by 2
for (int i = 0; i < n; ++i) {
count += n_sub (A[i], B[i], n_mult2, power2);
}
return count;
}
int main() {
int N, M;
std::cin >> N >> M;
auto ans = n_operations (N, M);
std::cout << ans << "\n";
return 0;
}

Count integer partions with k parts, each below some threshold m

I want to count the number of ways we can partition the number n, into k distinct parts where each part is not larger than m.
For k := 2 i have following algorithm:
public int calcIntegerPartition(int n, int k, int m) {
int cnt=0;
for(int i=1; i <= m;i++){
for(int j=i+1; j <= m; j++){
if(i+j == n){
cnt++;
break;
}
}
}
return cnt;
}
But how can i count integer partitions with k > 2? Usually I have n > 100000, k := 40, m < 10000.
Thank you in advance.

Let's start by choosing the k largest legal numbers: m, m-1, m-2, ..., m-(k-1). This adds up to k*m - k(k-1)/2. If m < k, there are no solutions because the smallest partition would be <= 0. Let's assume m >= k.
Let's say p = (km - k(k-1)/2) - n.
If p < 0, there are no solutions because the largest number we can make is less than n. Let's assume p >= 0. Note that if p = 0 there is exactly one solution, so let's assume p > 0.
Now, imagine we start by choosing the k largest distinct legal integers, and we then correct this to get a solution. Our correction involves moving values to the left (on the number line) 1 slot, into empty slots, exactly p times. How many ways can we do this?
The smallest value to start with is m-(k-1), and it can move as far down as 1, so up to m-k times. After this, each successive value can move up to its predecessor's move.
Now the problem is, how many nonincreasing integer sequences with a max value of m-k sum to p? This is the partition problem. I.e., how many ways can we partition p (into at most k partitions). This is no closed-form solution to this.
Someone has already written up a nice answer of this problem here (which will need slight modification to meet your restrictions):
Is there an efficient algorithm for integer partitioning with restricted number of parts?

As #Dave alludes to, there is already a really nice answer to the simple restricted integer case (found here (same link as #Dave): Is there an efficient algorithm for integer partitioning with restricted number of parts?).
Below is a variant in C++ which takes into account the maximum value of each restricted part. First, here is the workhorse:
#include <vector>
#include <algorithm>
#include <iostream>
int width;
int blockSize;
static std::vector<double> memoize;
double pStdCap(int n, int m, int myMax) {
if (myMax * m < n || n < m) return 0;
if (myMax * m == n || n <= m + 1) return 1;
if (m < 2) return m;
const int block = myMax * blockSize + (n - m) * width + m - 2;
if (memoize[block]) return memoize[block];
int niter = n / m;
if (m == 2) {
if (myMax * 2 >= n) {
myMax = std::min(myMax, n - 1);
return niter - (n - 1 - myMax);
} else {
return 0;
}
}
double count = 0;
for (; niter--; n -= m, --myMax) {
count += (memoize[myMax * blockSize + (n - m) * width + m - 3] = pStdCap(n - 1, m - 1, myMax));
}
return count;
}
As you can see pStdCap is very similar to the linked solution. The one noticeable difference are the 2 additional checks at the top:
if (myMax * m < n || n < m) return 0;
if (myMax * m == n || n <= m + 1) return 1;
And here is the function that sets up the recursion:
double CountPartLenCap(int n, int m, int myMax) {
if (myMax * m < n || n < m) return 0;
if (myMax * m == n || n <= m + 1) return 1;
if (m < 2) return m;
if (m == 2) {
if (myMax * 2 >= n) {
myMax = std::min(myMax, n - 1);
return n / m - (n - 1 - myMax);
} else {
return 0;
}
}
width = m;
blockSize = m * (n - m + 1);
memoize = std::vector<double>((myMax + 1) * blockSize, 0.0);
return pStdCap(n, m, myMax);
}
Explanation of the parameters:
n is the integer that you are partitioning
m is the length of each partition
myMax is the maximum value that can appear in a given partition. (the OP refers to this as the threshold)
Here is a live demonstration https://ideone.com/c3WohV
And here is a non memoized version of pStdCap which is a bit easier to understand. This is originally found in this answer to Is there an efficient way to generate N random integers in a range that have a given sum or average?
int pNonMemoStdCap(int n, int m, int myMax) {
if (myMax * m < n) return 0;
if (myMax * m == n) return 1;
if (m < 2) return m;
if (n < m) return 0;
if (n <= m + 1) return 1;
int niter = n / m;
int count = 0;
for (; niter--; n -= m, --myMax) {
count += pNonMemoStdCap(n - 1, m - 1, myMax);
}
return count;
}
If you actually intend to calculate the number of partitions for numbers as large as 10000, you are going to need a big int library as CountPartLenCap(10000, 40, 300) > 3.2e37 (Based off the OP's requirement).

Count of co-prime pairs from two arrays in less than O(n^2) complexity

I came to this problem in a challenge.
There are two arrays A and B both of size of N and we need to return the count of pairs (A[i],B[j]) where gcd(A[i],B[j])==1 and A[i] != B[j].
I could only think of brute force approach which exceeded time limit for few test cases.
for(int i=0; i<n; i++) {
for(int j=0; j<n; j++) {
if(__gcd(a[i],b[j])==1) {
printf("%d %d\n", a[i], b[j]);
}
}
}
Can you advice time efficient algorithm to solve this.
Edit: Not able to share question link as this was from a hiring challenge. Adding the constraints and input/output format as I remember.
Input -
First line will contain N, the number of elements present in both arrays.
Second line will contain N space separated integers, elements of array A.
Third line will contain N space separated integers, elements of array B.
Output -
The count of pairs A[i],A[j] as per the conditions.
Constraints -
1 <= N <= 10^5
1 < A[i],B[j] <= 10^9 where i,j < N

The first step is to use Eratosthenes sieve to calculate the prime numbers up to sqrt(10^9). This sieve can then be used to quickly find all prime factors of any number less than 10^9 (see the getPrimeFactors(...) function in the code sample below).
Next, for each A[i] with prime factors p0, p1, ..., pk, we compute all possible sub-products X - p0, p1, p0p1, p2, p0p2, p1p2, p0p1p2, p3, p0p3, ..., p0p1p2...pk and count them in map cntp[X]. Effectively, the map cntp[X] tells us the number of elements A[i] divisible by X, where X is a product of prime numbers to the power of 0 or 1. So for example, for the number A[i] = 12, the prime factors are 2, 3. We will count cntp[2]++, cntp[3]++ and cntp[6]++.
Finally, for each B[j] with prime factors p0, p1, ..., pk, we again compute all possible sub-products X and use the Inclusion-exclusion principle to count all non-coprime pairs C_j (i.e. the number of A[i]s that share at least one prime factor with B[j]). The numbers C_j are then subtracted from the total number of pairs - N*N to get the final answer.
Note: the Inclusion-exclusion principle looks like this:
C_j = (cntp[p0] + cntp[p1] + ... + cntp[pk]) -
(cntp[p0p1] + cntp[p0p2] + ... + cntp[pk-1pk]) +
(cntp[p0p1p2] + cntp[p0p1p3] + ... + cntp[pk-2pk-1pk]) -
...
and accounts for the fact that in cntp[X] and cntp[Y] we could have counted the same number A[i] twice, given that it is divisible by both X and Y.
Here is a possible C++ implementation of the algorithm, which produces the same results as the naive O(n^2) algorithm by OP:
// get prime factors of a using pre-generated sieve
std::vector<int> getPrimeFactors(int a, const std::vector<int> & primes) {
std::vector<int> f;
for (auto p : primes) {
if (p > a) break;
if (a % p == 0) {
f.push_back(p);
do {
a /= p;
} while (a % p == 0);
}
}
if (a > 1) f.push_back(a);
return f;
}
// find coprime pairs A_i and B_j
// A_i and B_i <= 1e9
void solution(const std::vector<int> & A, const std::vector<int> & B) {
// generate prime sieve
std::vector<int> primes;
primes.push_back(2);
for (int i = 3; i*i <= 1e9; ++i) {
bool isPrime = true;
for (auto p : primes) {
if (i % p == 0) {
isPrime = false;
break;
}
}
if (isPrime) {
primes.push_back(i);
}
}
int N = A.size();
struct Entry {
int n = 0;
int64_t p = 0;
};
// cntp[X] - number of times the product X can be expressed
// with prime factors of A_i
std::map<int64_t, int64_t> cntp;
for (int i = 0; i < N; i++) {
auto f = getPrimeFactors(A[i], primes);
// count possible products using non-repeating prime factors of A_i
std::vector<Entry> x;
x.push_back({ 0, 1 });
for (auto p : f) {
int k = x.size();
for (int i = 0; i < k; ++i) {
int nn = x[i].n + 1;
int64_t pp = x[i].p*p;
++cntp[pp];
x.push_back({ nn, pp });
}
}
}
// use Inclusion–exclusion principle to count non-coprime pairs
// and subtract them from the total number of prairs N*N
int64_t cnt = N; cnt *= N;
for (int i = 0; i < N; i++) {
auto f = getPrimeFactors(B[i], primes);
std::vector<Entry> x;
x.push_back({ 0, 1 });
for (auto p : f) {
int k = x.size();
for (int i = 0; i < k; ++i) {
int nn = x[i].n + 1;
int64_t pp = x[i].p*p;
x.push_back({ nn, pp });
if (nn % 2 == 1) {
cnt -= cntp[pp];
} else {
cnt += cntp[pp];
}
}
}
}
printf("cnt = %d\n", (int) cnt);
}
Live example
I cannot estimate the complexity analytically, but here are some profiling result on my laptop for different N and uniformly random A[i] and B[j]:
For N = 1e2, takes ~0.02 sec
For N = 1e3, takes ~0.05 sec
For N = 1e4, takes ~0.38 sec
For N = 1e5, takes ~3.80 sec
For comparison, the O(n^2) approach takes:
For N = 1e2, takes ~0.00 sec
For N = 1e3, takes ~0.15 sec
For N = 1e4, takes ~15.1 sec
For N = 1e5, takes too long, didn't wait to finish

Python Implementation:
import math
from collections import defaultdict
def sieve(MAXN):
spf = [0 for i in range(MAXN)]
spf[1] = 1
for i in range(2, MAXN):
spf[i] = i
for i in range(4, MAXN, 2):
spf[i] = 2
for i in range(3, math.ceil(math.sqrt(MAXN))):
if (spf[i] == i):
for j in range(i * i, MAXN, i):
if (spf[j] == j):
spf[j] = i
return(spf)
def getFactorization(x,spf):
ret = list()
while (x != 1):
ret.append(spf[x])
x = x // spf[x]
return(list(set(ret)))
def coprime_pairs(N,A,B):
MAXN=max(max(A),max(B))+1
spf=sieve(MAXN)
cntp=defaultdict(int)
for i in range(N):
f=getFactorization(A[i],spf)
x=[[0,1]]
for p in f:
k=len(x)
for i in range(k):
nn=x[i][0]+1
pp=x[i][1]*p
cntp[pp]+=1
x.append([nn,pp])
cnt=0
for i in range(N):
f=getFactorization(B[i],spf)
x=[[0,1]]
for p in f:
k=len(x)
for i in range(k):
nn=x[i][0]+1
pp=x[i][1]*p
x.append([nn,pp])
if(nn%2==1):
cnt+=cntp[pp]
else:
cnt-=cntp[pp]
return(N*N-cnt)
import random
N=10001
A=[random.randint(1,N) for _ in range(N)]
B=[random.randint(1,N) for _ in range(N)]
print(coprime_pairs(N,A,B))

How to count the numbers that are divisible by their sum of digits?

Following is a question from hackerearth.
here's the link to the problem
problem!
I coded its solution in java and c but got time limit exceeded for some test cases on submission. No participant was able to solve this for all test cases. What is the most efficient solution for this?
QUESTION:
Bob likes DSD Numbers. DSD Number is a number which is divisible by its
Digit Sum in Decimal Representation.
digitSum(n) : Sum of digits of n (in Decimal Representation)
eg: n = 1234 then digitSum(n) = 1 + 2 + 3 + 4 = 10
DSD Number is number n such that n % digitSum(n) equal to 0
Bob asked Alice to tell the number of DSD Numbers in range [L,R]
inclusive.
Constraints:
1 <= test cases <= 50
1<=L<=R<=10^9
Sample Input
4
2 5
1 10
20 45
1 100
Sample Output
4
10
9
33
Code in Java:
class DSD {
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
PrintWriter out=new PrintWriter(System.out);
int t=Integer.parseInt(br.readLine());
while(t-->0){
StringTokenizer st=new StringTokenizer(br.readLine());
int L=Integer.parseInt(st.nextToken());
int R=Integer.parseInt(st.nextToken());
int count=0,sum=0,i=L,j=0;
while(i>0){
sum+=i%10;
i=i/10;
}
if(L%sum==0)
count++;
for(i=L+1;i<=R;i++){
if(i%10!=0){
sum+=1;
}
else
{
j=i;
while(j%10==0){
sum-=9;
j/=10;
}
sum+=1;
}
if(i%sum==0)
count++;
}
out.println(count);
}
out.close();
}
}

We can solve this problem by using dynamic programming.
Observation:
There will be maximum 10 digits for each number, so the maximum sum of digit for each number will be less than 100.
So, assuming that we know the sum of digit for one number, by processing digit by digit, we have four things to check:
Whether the current number is larger than the lower bound.
Whether the current number is smaller than the upper bound.
What is the mod of current number with its sum.
What is the current sum of all digits.
We come up with this function int count(int digit, boolean larger, boolean smaller, int left, int mod), and then, the dp state: dp[digit][larger][smaller][left][mod].
For each test case, time complexity is number of possible sum^3 x number of digit = 100^3*10 = 10^7.
There is 50 test cases -> 50*10^7 = 5*10^8 operations, which still be in the time limit.
Java code:
static int[][][][][] dp;
static int[][][][][] check;
static int cur = 0;
public static void main(String[] args) throws FileNotFoundException {
// PrintWriter out = new PrintWriter(new FileOutputStream(new File(
// "output.txt")));
PrintWriter out = new PrintWriter(System.out);
Scanner in = new Scanner();
int n = in.nextInt();
dp = new int[11][2][2][82][82];
check = new int[11][2][2][82][82];
for (int i = 0; i < n; i++) {
int l = in.nextInt();
int r = in.nextInt();
String L = "" + l;
String R = "" + r;
while (L.length() < R.length()) {
L = "0" + L;
}
int result = 0;
for (int j = 1; j <= 81; j++) {
cur = cur + 1;
result += count(0, 0, 0, j, 0, j, L, R);
}
out.println(result);
}
out.close();
}
public static int count(int index, int larger, int smaller, int left,
int mod, int sum, String L, String R) {
if (index == L.length()) {
if (left == 0 && mod == 0) {
return 1;
}
return 0;
}
if((L.length() - index) * 9 < left){
return 0;
}
if (check[index][larger][smaller][left][mod] == cur) {
return dp[index][larger][smaller][left][mod];
}
//System.out.println(cur);
check[index][larger][smaller][left][mod] = cur;
int x = L.charAt(index) - '0';
int y = R.charAt(index) - '0';
int result = 0;
for (int i = 0; i < 10 && i <= left; i++) {
if (x > i && larger == 0) {
continue;
}
if (y < i && smaller == 0) {
continue;
}
int nxtLarger = larger;
int nxtSmaller = smaller;
if (x < i) {
nxtLarger = 1;
}
if (y > i) {
nxtSmaller = 1;
}
int nxtMod = (mod * 10 + i) % sum;
result += count(index + 1, nxtLarger, nxtSmaller, left - i, nxtMod,
sum, L, R);
}
return dp[index][larger][smaller][left][mod] = result;
}
Update: I have submitted and passed all the test cases for this problem, (2nd person who solved this) This is the link of my submission

Let f (L, R) = "number of integers L ≤ x ≤ R where x is divisible by the sum of its digits". We define that x = 0 is not counted.
Let g (M) = "number of integers 1 ≤ x < M where x is divisible by the sum of its digits". We have f (L, R) = g (R + 1) - g (L).
Find the largest k ≥ 0 such that 10^k <= M. Find the largest a ≥ 1 such that a * 10^k <= M. All integers < M have at most 9k + (a-1) as sum of digits.
Let h (M, n) = "number of integers 1 ≤ x < M where x is divisible by n, and the sum of digits is n". g (M) is the sum of h (M, n) for 1 ≤ n ≤ 9*k + (a - 1).
Let r (a, k, n) = "number of integers a*10^k ≤ x < (a+1)*10^k where x is divisible by n, and the sum of digits is n". h (M, n) can be calculated by adding values of r (a, k, n) in an obvious way; for example:
h (1,234,000,000, n) = r (0, 9, n) + r (10, 8, n) + r (11, 8, n) + r (120, 7, n) + r (121, 7, n) + r (122, 7, n) + r (1230, 6, n) + r (1231, 6, n) + r (1232, 6, n) + r (1233, 6, n).
Let f (k, n, d, m) = "number of integers 0 ≤ x < 10^k where the sum of digits is d, and x % n = m". We can calculate r (a, k, n) using this function: The last k digits must have a digit sum of n - digitsum (a). If the whole number is divisible by n, then the last k digits must have a remainder of (- a*10^k) % n. So r (a, k, n) = f (k, n, n - digitsum(a), - (a*10^k) % n).
f (k, n, d, m) is trivial if k = 1: Only for the number d is the sum of digits equal to d, so f (1, n, d, m) is 1 if d % n = m, and 0 otherwise.
To calculate f (k+1, n, d, m) we add f (k, n, d-a, (m - a*10^k)%n) for 0 ≤ a ≤ 9. Obviously all the values f (k, n, d, m) must be stored so they are not recalculated again and again.
And that's it. How many operations: If R < 10^r, then numbers have up to 9r digits. We calculate values f (k, n, d, m) for 1 ≤ k ≤ r, for 1 ≤ n ≤ 9r, for 0 ≤ d ≤ 9r, for 0 ≤ m < n. For each of those we add 10 different numbers, so we have less than 10,000 r^4 additions. So numbers up to 10^19 are no problem.

The following approach should take about 10^7 operations per case.
Split numbers into a prefix (n/10000) and a suffix (n%10000). Once you choose a digit sum, only a little data from each of the prefix and suffix are needed to determine if the digit sum divides the number. (This is related to some things gnasher729 said, but I get a much different running time.)
For each possible digit sum d from 1 to 81,
Map prefix p to a pair (p*10000 % d, digit sum(p)).
Tally the counts in a matrix M.
Map each possible suffix s to a pair (s % d, digit sum(s)).
Tally the counts in a matrix N.
For every (a,b),
total += M[a,b] *N[-a%d,d-b]
There are about 81 * (10^5 + 10^4) steps.
The edge cases where a prefix is partially allowed (L/10000, R/10000, and 100000) can be brute-forced in about 20000 steps once.

Interesting problem. Straightforward solution would be to iterate through the numbers from L to R, calculate the sum of digits for each, and check for each whether the number is divisible by the sum of digits.
Calculating the sum of digits can be made faster obviously. The numbers xxx0, xxx1, xxx2, ..., xxx9 have digit sums n, n+1, n+2, ..., n+9. So for ten consecutive numbers almost no effort is needed to calculate the digit sum, just a modulo operation to check for divisibility.
The modulo check can be made faster. Compilers use clever tricks to divide by constants, replacing a slow division with a shift and a multiplication. You can search for how this is done, and since there are only 81 possible divisors, do at runtime what the compiler would do for constants. That should get the time down to few nanoseconds per number.
To do better: I'd make a loop checking for numbers with digit sum 1, digit sum 2, etc. As an example, assume I'm checking numbers with digit sum 17. These numbers must have a digit sum of 17, and also be multiples of 17. I take the numbers from 0000 to 9999 and for each I calculate the sum of digits, and the value modulo 17, and divide them into 37 x 17 sets where all the numbers in the set have the same digit sum and the same value modulo 17 and count the elements in each set.
Then to check the numbers from 0 to 9999: I pick the set where the digit sum is 17, and the value modulo 17 is 0 and take the element count of that set. To check numbers from 10,000 to 19,999: I pick the set where the digit sum is 16, and the value modulo 17 is 13 (because 10013 is divisible by 17), and so on.
That's just the idea. I think with a bit of cleverness that can be extended to a method that takes O (log^4 R) steps to handle all the numbers from L to R.

In the C code below, I have focused on the core portion, i.e. finding the DSD count. The code is admittedly ugly, but that's what you get when coding in a hurry.
The basic observation is that the digit sum can be simplified by tracking the digits of the number individually, reducing the digit sum determination to simple increments/decrements in each step. There are probably clever ways to accelerate the modulo computations, I could not come up with any on the double.
On my machine (Xeon E3 1270 v2, 3.5 GHz) the code below finds the count of DSDs in [1,1e9] in 3.54 seconds. I compiled with MSVC 2010 at optimization level -O2. While you stated a time limit of 1 second in an update to your question, it is not clear that this extreme case is exercised by the framework at the website you mentioned. In any event this will provide a reasonable baseline to compare other proposed solutions against.
#include <stdio.h>
#include <stdlib.h>
/* sum digits in decimal representation of x */
int digitsum (int x)
{
int sum = 0;
while (x) {
sum += x % 10;
x = x / 10;
}
return sum;
}
/* split integer into individual decimal digits. p[0]=ones, p[1]=tens, ... */
void split (int a, int *p)
{
int i = 0;
while (a) {
p[i] = a % 10;
a = a / 10;
i++;
}
}
/* return number of DSDs in [first,last] inclusive. first, last in [1,1e9] */
int count_dsd (int first, int last)
{
int num, ds, count = 0, p[10] = {0};
num = first;
split (num, p);
ds = digitsum (num);
while (p[9] < 10) {
while (p[8] < 10) {
while (p[7] < 10) {
while (p[6] < 10) {
while (p[5] < 10) {
while (p[4] < 10) {
while (p[3] < 10) {
while (p[2] < 10) {
while (p[1] < 10) {
while (p[0] < 10) {
count += ((num % ds) == 0);
if (num == last) {
return count;
}
num++;
p[0]++;
ds++;
}
p[0] = 0;
p[1]++;
ds -= 9;
}
p[1] = 0;
p[2]++;
ds -= 9;
}
p[2] = 0;
p[3]++;
ds -= 9;
}
p[3] = 0;
p[4]++;
ds -= 9;
}
p[4] = 0;
p[5]++;
ds -= 9;
}
p[5] = 0;
p[6]++;
ds -= 9;
}
p[6] = 0;
p[7]++;
ds -= 9;
}
p[7] = 0;
p[8]++;
ds -= 9;
}
p[8] = 0;
p[9]++;
ds -= 9;
}
return count;
}
int main (void)
{
int i, first, last, *count, testcases;
scanf ("%d", &testcases);
count = malloc (testcases * sizeof(count[0]));
if (!count) return EXIT_FAILURE;
for (i = 0; i < testcases; i++) {
scanf ("%d %d", &first, &last);
count[i] = count_dsd (first, last);
}
for (i = 0; i < testcases; i++) {
printf ("%d\n", count[i]);
}
free (count);
return EXIT_SUCCESS;
}
I copied the sample inputs stated in the question into a text file testdata, and when I call the executable like so:
dsd < testdata
the output is as desired:
4
10
9
33

Solution in Java
Implement a program to find out whether a number is divisible by the sum of its digits.
Display appropriate messages.
class DivisibleBySum
{
public static void main(String[] args)
{
// Implement your code here
int num = 123;
int number = num;
int sum=0;
for(;num>0;num /=10)
{
int rem = num % 10;
sum += rem;
}
if(number %sum ==0)
System.out.println(number+" is divisible by sum of its digits");
else
System.out.println(number+" is not divisible by sum of its digits");
}
}

Sum of series: 1^1 + 2^2 + 3^3 + ... + n^n (mod m)

Can someone give me an idea of an efficient algorithm for large n (say 10^10) to find the sum of above series?
Mycode is getting klilled for n= 100000 and m=200000
#include<stdio.h>
int main() {
int n,m,i,j,sum,t;
scanf("%d%d",&n,&m);
sum=0;
for(i=1;i<=n;i++) {
t=1;
for(j=1;j<=i;j++)
t=((long long)t*i)%m;
sum=(sum+t)%m;
}
printf("%d\n",sum);
}

Two notes:
(a + b + c) % m
is equivalent to
(a % m + b % m + c % m) % m
and
(a * b * c) % m
is equivalent to
((a % m) * (b % m) * (c % m)) % m
As a result, you can calculate each term using a recursive function in O(log p):
int expmod(int n, int p, int m) {
if (p == 0) return 1;
int nm = n % m;
long long r = expmod(nm, p / 2, m);
r = (r * r) % m;
if (p % 2 == 0) return r;
return (r * nm) % m;
}
And sum elements using a for loop:
long long r = 0;
for (int i = 1; i <= n; ++i)
r = (r + expmod(i, i, m)) % m;
This algorithm is O(n log n).

I think you can use Euler's theorem to avoid some exponentation, as phi(200000)=80000. Chinese remainder theorem might also help as it reduces the modulo.

You may have a look at my answer to this post. The implementation there is slightly buggy, but the idea is there. The key strategy is to find x such that n^(x-1)<m and n^x>m and repeatedly reduce n^n%m to (n^x%m)^(n/x)*n^(n%x)%m. I am sure this strategy works.

I encountered similar question recently: my 'n' is 1435, 'm' is 10^10. Here is my solution (C#):
ulong n = 1435, s = 0, mod = 0;
mod = ulong.Parse(Math.Pow(10, 10).ToString());
for (ulong i = 1; i <= n;
{
ulong summand = i;
for (ulong j = 2; j <= i; j++)
{
summand *= i;
summand = summand % mod;
}
s += summand;
s = s % mod;
}
At the end 's' is equal to required number.

Are you getting killed here:
for(j=1;j<=i;j++)
t=((long long)t*i)%m;
Exponentials mod m could be implemented using the sum of squares method.
n = 10000;
m = 20000;
sqr = n;
bit = n;
sum = 0;
while(bit > 0)
{
if(bit % 2 == 1)
{
sum += sqr;
}
sqr = (sqr * sqr) % m;
bit >>= 2;
}

I can't add comment, but for the Chinese remainder theorem, see http://mathworld.wolfram.com/ChineseRemainderTheorem.html formulas (4)-(6).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio