Can this problem be solved by dynamic programming? - algorithm

Given n, m, d. The answer is stored in sum variable in the below code:
int x = m / d;
int sum = 0;
for (int i = 1; i <= x; i++) {
sum += mobius(i) * ((x / i) ^ n);
Now the problem is to find the total sum % (10^9 + 7) when d varies from [l, r] with n, m as mentioned above. I have only been able to do it by brute-force, but the constraints are 1 <= n, m, l, r <= 10^7. So the brute-force solution cannot pass the time limit.
Is there some underlying overlapping subproblem and optimal substructure property to this problem which can be used to solve the problem by dynamic programming?
Link: Mobius Function, I have pre-calculated the mobius function in O(nlogn).
Edit: Given t, n, m. Where t is the number of test cases,
l, r is given t times. We have to output the total sum as mentioned above.
Sample Input:
T : 2
N : 3, M : 10
Values of l and r
9 9
10 10
Sample Output:

Note that when you divide m by d to compute x, there will only be about 2*sqrt(m) unique values for x.
This means you only need to trigger the second loop for each unique value of x.
Similarly, in the computation of x/i, there will only be about 2*sqrt(x) unique values for (x/i). This means you only need to compute (x/i)^n for each unique value.
For each unique value of x/i there will be a range of i values that produce this value.
You will then need to add up mobius[i] for all the values of i that produce the same output. This can be done by preparing an array with the cumulative sum of the Mobius function (this cumulative sum is called the Mertens function).
For example, if
M[k] = sum[ Mobius(i) for i = 1..k ]
sum[ Mobius(i) for i = low..high ] = M[high] - M[low-1]
Overall the complexity is O( sqrt(n) * sqrt(n) ) = O(n) (in addition to the time spent computing the Mobius function).


Fast algorithm for sum of steps taken by the Euclidean algorithm over pairs of numbers under an upper bound

Note: This may involve a good deal of number theory, but the formula I found online is only an approximation, so I believe an exact solution requires some sort of iterative calculation by a computer.
My goal is to find an efficient algorithm (in terms of time complexity) to solve the following problem for large values of n:
Let R(a,b) be the amount of steps that the Euclidean algorithm takes to find the GCD of nonnegative integers a and b. That is, R(a,b) = 1 + R(b,a%b), and R(a,0) = 0. Given a natural number n, find the sum of R(a,b) for all 1 <= a,b <= n.
For example, if n = 2, then the solution is R(1,1) + R(1,2) + R(2,1) + R(2,2) = 1 + 2 + 1 + 1 = 5.
Since there are n^2 pairs corresponding to the numbers to be added together, simply computing R(a,b) for every pair can do no better than O(n^2), regardless of the efficiency of R. Thus, to improve the efficiency of the algorithm, a faster method must somehow calculate the sum of R(a,b) over many values at once. There are a few properties that I suspect might be useful:
If a = b, then R(a,b) = 1
If a < b, then R(a,b) = 1 + R(b,a)
R(a,b) = R(ka,kb) where k is some natural number
If b <= a, then R(a,b) = R(a+b,b)
If b <= a < 2b, then R(a,b) = R(2a-b,a)
Because of the first two properties, it is only necessary to find the sum of R(a,b) over pairs where a > b. I tried using this in addition to the third property in a method that computes R(a,b) only for pairs where a and b are also coprime in addition to a being greater than b. The total sum is then n plus the sum of (n / a) * ((2 * R(a,b)) + 1) over all such pairs (using integer division for n / a). This algorithm still had time complexity O(n^2), I discovered, due to Euler's totient function being roughly linear.
I don't need any specific code solution, I just need to figure out the procedure for a more efficient algorithm. But if the programming language matters, my attempts to solve this problem have used C++.
Side note: I have found that a formula has been discovered that nearly solves this problem, but it is only an approximation. Note that the formula calculates the average rather than the sum, so it would just need to be multiplied by n^2. If the formula could be expanded to reduce the error, it might work, but from what I can tell, I'm not sure if this is possible.
Using Stern-Brocot, due to symmetry, we can look at just one of the four subtrees rooted at 1/3, 2/3, 3/2 or 3/1. The time complexity is still O(n^2) but obviously performs less calculations. The version below uses the subtree rooted at 2/3 (or at least that's the one I looked at to think through :). Also note, we only care about the denominators there since the numerators are lower. Also note the code relies on rules 2 and 3 as well.
C++ code (takes about a tenth of a second for n = 10,000):
#include <iostream>
using namespace std;
long g(int n, int l, int mid, int r, int fromL, int turns){
long right = 0;
long left = 0;
if (mid + r <= n)
right = g(n, mid, mid + r, r, 1, turns + (1^fromL));
if (mid + l <= n)
left = g(n, l, mid + l, mid, 0, turns + fromL);
// Multiples
int k = n / mid;
// This subtree is rooted at 2/3
return 4 * k * turns + left + right;
long f(int n) {
// 1/1, 2/2, 3/3 etc.
long total = n;
// 1/2, 2/4, 3/6 etc.
if (n > 1)
total += 3 * (n >> 1);
if (n > 2)
// Technically 3 turns for 2/3 but
// we can avoid a subtraction
// per call by starting with 2. (I
// guess that means it could be
// another subtree, but I haven't
// thought it through.)
total += g(n, 2, 3, 1, 1, 2);
return total;
int main() {
cout << f(10000);
return 0;
I think this is a hard problem. We can avoid division and reduce the space usage to linear at least via the Stern--Brocot tree.
def f(n, a, b, r):
return r if a + b > n else r + f(n, a + b, b, r) + f(n, a + b, a, r + 1)
def R_sum(n):
return sum(f(n, d, d, 1) for d in range(1, n + 1))
def R(a, b):
return 1 + R(b, a % b) if b else 0
def test(n):
print(sum(R(a, b) for a in range(1, n + 1) for b in range(1, n + 1)))

Sum of remainders over the entire array for several queries

I am looking at this challenge:
You are provided an array A[ ] of N elements.
Also, you have to answer M queries.
Each query is of following type-
Given a value X, find A[1]%X + A[2]%X + ...... + A[N]%X
1<=elements of array<=100000
I am having a problem in computing this value in an optimized way.
How can we compute this value for different X?
Here is a way that you could at least reduce the multiplicative factor in the time complexity.
In the C standard, the modulo (or remainder) is defined to be a % b = a - (a / b) * b (where / is integer division).
A naive, iterative way (possibly useful on embedded systems with no division unit) to compute the modulo is therefore (pseudo-code):
function remainder (A, B):
rem = A
while rem > B:
rem -= B;
return rem
But how does this help us at all? Suppose we:
Sort the array A[i] in ascending order
Pre-compute the sum of all elements in A[] -> S
Find the first element (with index I) greater than X
From the pseudocode above it is clear that at least (one multiple of) X must be subtracted from all elements in the array from index I onwards. Therefore we must subtract (N - I + 1) * X from the sum S.
Even better: we can keep a variable (call it K, initialize to zero) which is equal to the total multiple of X we must subtract from S to find the sum of all remainders. Thus at this stage we could simply add N - I + 1 to K.
Repeat the above, finding the first element greater than the next limit L = 2X, 3X, ... and so on, until we have passed the end of the array.
Finally, the result is given by S - K * X.
function findSumOfRemainder (A[N], X):
sort A
S = sum A
K = 0
L = X
I = 0
while I < N:
I = lowest index such that A[I] >= L
K += N - I + 1
L += X
return S - K * X
What is the best way to find I at each stage, and how does it relate to the time-complexity?
Binary search: Since the entire array is sorted, to find the first index I at which A[I] >= L, we can just do a binary search on the array (or succeeding sub-array at each stage of the iteration, bounded by [I, N - 1]). This has complexity O( log[N - I + 1] ).
Linear search: Self-explanatory - increment I until A[I] >= L, taking O( N - I + 1 )
You may dismiss the linear search method as being "stupid" - but let's look at the two different extreme cases. For simplicity we can assume that the values of A are "uniformly" distributed.
(max(A) / X) ~ N: We will have to compute very few values of I; binary search is the preferred method here because the complexity would be bounded by O([NX / max(A)] * log[N]), which is much better than that of linear search O(N).
(max(A) / X) << N: We will have to compute many values of I, each separated by only a few indices. In this case the total binary search complexity would be bounded by O(log N) + O(log[N-1]) + O(log[N-2]) + ... ~ O(N log N), which is significantly worse than that of linear search.
So which one do we choose? Well this is where I must get off, because I don't know what the optimal answer would be (if there even is one). But the best I can say is to set some threshold value for the ratio max(A) / X - if greater then choose binary search, else linear.
I welcome any comments on the above + possible improvements; the range constraint of the values may allow better methods for finding values of I (e.g. radix sort?).
using namespace std;
int main(){
int t;
cin >> t;
int n;
cin >> n;
int arr[n];
long long int sum = 0;
for(int i=0;i<n;i++){
cin >> arr[i];
cout << accumulate(arr, arr+n, sum) - n << '\n';
In case you don't know about accumulate refer this.

Probability: No of ways to win if you have n dice with m faces each

You are given a number of dices n, each with a number of faces m. You roll all the n dices and note the sum of all the throws you get from rolling each dice. If you get a sum >= x, you win, otherwise you lose. Find the probability that you win.
I thought of generating all combinations of 1 to m ( of size n ) and keeping count of only those whose sum is more then x . Total no of ways are m^n
After that its just the divison of both.
Is there a better way ?
[EDIT: As noted by jpalacek, the time complexity was wrong -- I've now fixed this.]
You can solve this more efficiently with dynamic programming, by first changing it into the question:
How many ways can I get at least x from n dice?
Express this as f(x, n). Then it must be that
f(x, n) = sum(f(x - i, n - 1)) for all 1 <= i <= m.
I.e. if the first die has 1, the remaining n - 1 dice must add up to at least x - 1; if the first die has 2, the remaining n - 1 dice must add up to at least x - 2; and so on.
There are m terms in the sum, so if you memoise this function, it will be O(m^2*n^2), since it will be required to do this summing work at most (m * n) * n times (i.e. once per unique set of inputs to the function, assuming that the first parameter x <= m * n).
As a final step to get a probability, just divide the result of f(x, n) by the total number of possible outcomes, i.e. m^n.
Just to add up on #j_random_hacker's basically correct answer, you can make it even faster when you note that
f(x, n) = f(x-1, n) - f(x-m-1, n-1) + f(x-1, n-1) if x>m+1
This way, you'll only spend O(1) time calculating each of the f value.
//Passing curFace value will disallow duplicate combinations
//For 3 dices - and sum 8 - 2 4 2 and 2 2 4 are the same combination - so should be counted as one
int sums(int totSum,int noDices,int mFaces,int curFace,HashMap<String,Integer> map)
int count=0;
if (noDices<=0 || totSum<=0)
return 0;
if (noDices==1)
if (totSum>=1 & totSum<=mFaces)
return 1;
return 0;
if (map.containsKey(noDices+"-"+totSum))
return map.get(noDices+"-"+totSum);
for (int i=curFace;i<=mFaces;i++)
map.put(noDices+"-" +totSum,count);
return count;

Generating random sublist from ordered list that maintains ordering

Consider a problem where a random sublist of k items, Y, must be selected from X, a list of n items, where the items in Y must appear in the same order as they do in X. The selected items in Y need not be distinct. One solution is this:
for i = 1 to k
A[i] = floor(rand * n) + 1
Y[i] = X[A[i]]
sort Y according to the ordering of A
However, this has running time O(k log k) due to the sort operation. To remove this it's tempting to
high_index = n
for i = 1 to k
index = floor(rand * high_index) + 1
Y[k - i + 1] = X[index]
high_index = index
But this gives a clear bias to the returned list due to the uniform index selection. It feels like a O(k) solution is attainable if the indices in the second solution were distributed non-uniformly. Does anyone know if this is the case, and if so what properties the distribution the marginal indices are drawn from has?
Unbiased O(n+k) solution is trivial, high-level pseudo code.
create an empty histogram of size n [initialized with all elements as zeros]
populate it with k uniformly distributed variables at range. (do k times histogram[inclusiveRand(1,n)]++)
iterate the initial list [A], while decreasing elements in the histogram and appending elements to the result list.
Explanation [edit]:
The idea is to chose k elements out of n at random, with uniform
distribution for each, and create a histogram out of it.
This histogram now contains for each index i, how many times A[i] will appear in the resulting Y list.
Now, iterate the list A in-order, and for each element i, insert A[i] into the resulting Y list histogram[i] times.
This guarantees you maintain the order because you insert elements in order, and "never go back".
It also guarantees unbiased solution since for each i,j,K: P(histogram[i]=K) = P(histogram[j]=K), so for each K, each element has the same probability to appear in the resulting list K times.
I believe it can be done in O(k) using the order statistics [X(i)] but I cannot figure it out though :\
By your first algorithm, it suffices to generate k uniform random samples of [0, 1) in sorted order.
Let X1, ..., Xk be these samples. Given that Xk = x, the conditional distribution of X1, ..., Xk-1 is k - 1 uniform random samples of [0, x) in sorted order, so it suffices to sample Xk and recurse.
What's the probability that Xk < x? Each of k independent samples of [0, 1) must be less than x, so the answer (the cumulative distribution function for Xk) is x^k. To sample according to the cdf, all we have to do is invert it on a uniform random sample of [0, 1): pow(random(), 1.0 / k).
Here's an (expected) O(k) algorithm I actually would consider implementing. The idea is to dump the samples into k bins, sort each bin, and concatenate. Here's some untested Python:
def samples(n, k):
bins = [[] for i in range(k)]
for i in range(k):
x = randrange(n)
bins[(x * k) // n].append(x)
result = []
for bin in bins:
return result
Why is this efficient in expectation? Let's suppose we use insertion sort on each bin (each bin has expected size O(1)!). On top of operations that are O(k), we're going to pay proportionally to the number of sum of the squares of the bin sizes, which is basically the number of collisions. Since the probability of two samples colliding is at most something like 4/k and we have O(k^2) pairs of samples, the expected number of collisions is O(k).
I suspect rather strongly that the O(k) guarantee can be made with high probability.
You can use counting sort to sort Y and thus make the sorting linear with respect to k. However for that you need one additional array of length n. If we assume you have already allocated that, you may execute the code you are asking for arbitrary many times with complexity O(k).
The idea is just as you describe, but I will use one more array cnt of size n that I assume is initialized to 0, and another "stack" st that I assume is empty.
for i = 1 to k
A[i] = floor(rand * n) + 1
if cnt[A[i]] == 1 // Needed to be able to traverse the inserted elements faster
for elem in st
for i = 0 to cnt[elem]
for elem in st
cnt[elem] = 0
EDIT: as mentioned by oldboy what I state in the post is not true - I still have to sort st, which might be a bit better then the original proposition but not too much. So This approach will only be good if k is comparable to n and then we just iterate trough cnt linearly and construct Y this way. This way st is not needed:
for i = 1 to k
A[i] = floor(rand * n) + 1
for i = 1 to k
for j = 0 to cnt[i]
cnt[i] =0
For the first index in Y, the distribution of indices in X is given by:
P(x; n, k) = binomial(n - x + k - 2, k - 1) / norm
where binomial denotes calculation of the binomial coefficient, and norm is a normalisation factor, equal to the total number of possible sublist configurations.
norm = binomial(n + k - 1, k)
So for k = 5 and n = 10 we have:
norm = 2002
P(x = 0) = 0.357, P(x <= 0) = 0.357
P(x = 1) = 0.245, P(x <= 1) = 0.604
P(x = 2) = 0.165, P(x <= 2) = 0.769
P(x = 3) = 0.105, P(x <= 3) = 0.874
P(x = 4) = 0.063, P(x <= 4) = 0.937
... (we can continue this up to x = 10)
We can sample the X index of the first item in Y from this distribution (call it x1). The distribution of the second index in Y can then be sampled in the same way with P(x; (n - x1), (k - 1)), and so on for all subsequent indices.
My feeling now is that the problem is not solvable in O(k), because in general we are unable to sample from the distribution described in constant time. If k = 2 then we can solve in constant time using the quadratic formula (because the probability function simplifies to 0.5(x^2 + x)) but I can't see a way to extend this to all k (my maths isn't great though).
The original list X has n items. There are 2**n possible sublists, since every item will or will not appear in the resulting sublist: each item adds a bit to the enumeration of the possible sublists. You could view this enumeration of a bitword of n bits.
Since your are only want sublists with k items, you are interested in bitwords with exactly k bits set.
A practical algorithm could pick (or pick not) the first element from X, and then recurse into the rightmost n-1 substring of X, taking into account the accumulated number of chosen items. Since the X list is processed in order, the Y list will also be in order.
The original list X has n items. There are 2**n possible sublists, since every item will or will not appear in a sublist: each item adds a bit to the enumeration of the possible sublists. You could view this enumeration of a bitword of n bits.
Since your are only want sublists with k items, you are interested in bitwords with exactly k bits set. A practical algorithm could pick (or pick not) the first element from X, and then recurse into the rightmost n-1 substring of X, taking into account the accumulated number of chosen items. Since the X list is processed in order, the Y list will also be in order.
#include <stdio.h>
#include <string.h>
unsigned pick_k_from_n(char target[], char src[], unsigned k, unsigned n, unsigned done);
unsigned pick_k_from_n(char target[], char src[]
, unsigned k, unsigned n, unsigned done)
unsigned count=0;
if (k>n) return 0;
if (k==0) {
target[done] = 0;
return 1;
if (n > 0) {
count += pick_k_from_n(target, src+1, k, n-1, done);
target[done] = *src;
count += pick_k_from_n(target, src+1, k-1, n-1, done+1);
return count;
int main(int argc, char **argv) {
char result[20];
char *domain = "OmgWtf!";
unsigned cnt ,len, want;
want = 3;
switch (argc) {
case 3:
domain = argv[2];
case 2:
sscanf(argv[1], "%u", &want);
case 1:
len = strlen(domain);
cnt = pick_k_from_n(result, domain, want, len, 0);
fprintf(stderr, "Count=%u\n", cnt);
return 0;
Removing the recursion is left as an exercise to the reader.
Some output:
plasser#pisbak:~/hiero/src$ ./a.out 3 ABBA

Integer distance

As a single operation between two positive integers we understand
multiplying one of the numbers by some prime number or dividing it by
such (provided it can be divided by this prime number without
the remainder). The distance between a and b denoted as d(a,b) is a
minimal amount of operations needed to transform number a into number
b. For example, d(69,42)=3.
Keep in mind that our function d indeed has characteristics of the
distance - for any positive ints a, b and c we get:
a) d(a,a)==0
b) d(a,b)==d(b,a)
c) the inequality of a triangle d(a,b)+d(b,c)>=d(a,c) is fulfilled.
You'll be given a sequence of positive ints a_1, a_2,...,a_n. For every a_i of them
output such a_j (j!=i) that d(a_i, a_j) is as low as possible. For example, the sequence of length 6: {1,2,3,4,5,6} should output {2,1,1,2,1,2}.
This seems really hard to me. What I think would be useful is:
a) if a_i is prime, we are unable to make anything less than a_i (unless it's 1) so the only operation allowed is multiplication. Therefore, if we have 1 in our set, for every prime number d(this_number, 1) is the lowest.
b) also, for 1 d(1, any_prime_number) is the lowest.
c) for a non-prime number we check if we have any of its factors in our set or multiplication of its factors
That's all I can deduce, though. The worst part is I know it will take an eternity for such an algorithm to run and check all the possibilities... Could you please try to help me with it? How should this be done?
Indeed, you can represent any number N as 2^n1 * 3^n2 * 5^n3 * 7^n4 * ... (most of the n's are zeroes).
This way you set a correspondence between a number N and infinite sequence (n1, n2, n3, ...).
Note that your operation is just adding or subtracting 1 at exactly one of the appropriate sequence's places.
Let N and M be two numbers, and their sequences be (n1, n2, n3, ...) and (m1, m2, m3, ...).
The distance between the two numbers is indeed nothing but |n1 - m1| + |n2 - m2| + ...
So, in order to find out the closest number, you need to calculate the sequences for all the input numbers (this is just decomposing them into primes). Having this decomposition, the calculation is straightforward.
In fact, you don't need the exact position of your prime factor: you just need to know, which is the exponent for each of the prime divisors.
this is the simple procedure for converting the number into the chain representation:
#include <map>
typedef std::map<unsigned int, unsigned int> ChainRepresentation;
// maps prime factor -> exponent, default exponent is of course 0
void convertToListRepresentation(int n, ChainRepresentation& r)
// find a divisor
int d = 2;
while (n > 1)
for (; n % d; d++)
if (n/d < d) // n is prime
n /= d;
... and the code for distance:
#include <set>
unsigned int chainDistance(ChainRepresentation& c1, ChainRepresentation& c2)
if (&c1 == &c2)
return 0; // protect from modification done by [] during self-comparison
int result = 0;
std::set<unsigned int> visited;
for (ChainRepresentation::const_iterator it = c1.begin(); it != c1.end(); ++it)
unsigned int factor = it->first;
unsigned int exponent = it->second;
unsigned int exponent2 = c2[factor];
unsigned int expabsdiff = (exponent > exponent2) ?
exponent - exponent2 : exponent2 - exponent;
result += expabsdiff;
for (ChainRepresentation::const_iterator it = c2.begin(); it != c2.end(); ++it)
unsigned int factor = it->first;
if (visited.find(factor) != visited.end())
unsigned int exponent2 = it->second;
// unsigned int exponent = 0;
result += exponent2;
return result;
For the given limits: 100_000 numbers not greater than a million the most-straightforward algorithm works (1e10 calls to distance()):
For each number in the sequence print its closest neighbor (as defined by minimal distance):
solution = []
for i, ai in enumerate(numbers):
all_except_i = (aj for j, aj in enumerate(numbers) if j != i)
solution.append(min(all_except_i, key=lambda x: distance(x, ai)))
print(', '.join(map(str, solution)))
Where distance() can be calculated as (see #Vlad's explanation):
def distance(a, b):
a = p1**n1 * p2**n2 * p3**n3 ...
b = p1**m1 * p2**m2 * p3**m3 ...
distance = |m1-n1| + |m2-n2| + |m3-n3| ...
diff = Counter(prime_factors(b))
return sum(abs(d) for d in diff.values())
Where prime_factors() returns prime factors of a number with corresponding multiplicities {p1: n1, p2: n2, ...}:
uniq_primes_factors = dict(islice(prime_factors_gen(), max(numbers)))
def prime_factors(n):
return dict(multiplicities(n, uniq_primes_factors[n]))
Where multiplicities() function given n and its factors returns them with their corresponding multiplicities (how many times a factor divides the number without a remainder):
def multiplicities(n, factors):
assert n > 0
for prime in factors:
alpha = 0 # multiplicity of `prime` in `n`
q, r = divmod(n, prime)
while r == 0: # `prime` is a factor of `n`
n = q
alpha += 1
q, r = divmod(n, prime)
yield prime, alpha
prime_factors_gen() yields prime factors for each natural number. It uses Sieve of Eratosthenes algorithm to find prime numbers. The implementation is based on gen_primes() function by #Eli Bendersky:
def prime_factors_gen():
"""Yield prime factors for each natural number."""
D = defaultdict(list) # nonprime -> prime factors of `nonprime`
D[1] = [] # `1` has no prime factors
for q in count(1): # Sieve of Eratosthenes algorithm
if q not in D: # `q` is a prime number
D[q + q] = [q]
yield q, [q]
else: # q is a composite
for p in D[q]: # `p` is a factor of `q`: `q == m*p`
# therefore `p` is a factor of `p + q == p + m*p` too
D[p + q].append(p)
yield q, D[q]
del D[q]
See full example in Python.
2, 1, 1, 2, 1, 2
Without bounds on how large your numbers can be and how many numbers can be on the input, we can't really deduce it will take "an eternity" to complete. I am tempted to suggest the most "obvious" solution I can think of
Given the factorization of the numbers it is very easy to find their distance
60 = (2^2)*(3^1)*(5^1)*(7^0)
42 = (2^1)*(3^1)*(5^0)*(7^1)
distance = 3
Calculating this factorization using the naive trial division should take at most O(sqrt(N)) time per number, where N is the number being factorized.
Given the factorizations, you only have O(n^2) combinations to worry about, where n is the ammount of numbers. If you store all the factorizations so that you only compute them once, this step shouldn't take that long unless you have a really large amount of numbers.
You do wonder if there is a faster algorithm though. Perhaps it is possible to do some greatest common divisor trick to avoid computing large factorizations and perhaps we can use some graph algorithms to find the distances in a smarter way.
Haven't really thought this through, but it seems to me that to get from prime A to prime B you multiply A * B and then divide by A.
If you thus break the initial non-prime A & B into their prime factors, factor out the common prime factors, and then use the technique in the first paragraph to convert the unique primes, you should be following a minimal path to get from A to B.
