Calculating the sum of phi(k) for 1 <= k <= N? - algorithm

I need to calculate the sum of phi(k) for 1 <= k <= N where N = 1,000,000 and phi(k) is Euler's totient function. This is for a project Euler problem. I've already solved it using this previous StackOverflow question where it asks to calculate each value of phi(k) for 1 < k < N. However, I wonder if any further optimizations can be made since we only require the final sum of the phi(k) and not the individual value of each addend.

The Wikipedia page on Euler's totient function includes a formula due to Arnold Wafisz for the sum of φ(k) for k from 1 to n:
sum(1<=k<=n) φ(k) = (1 + sum(1<=k<=n) μ(k)*(floor(n/k))^2) / 2
(It's a lot easier to read on Wikipedia.)
The Möbius function μ(k) is 0 if k has any squared prime factor, and otherwise (-1)f where f is the number of unique prime factors in k. (In other words, it is 1 if k's prime factorization has an even number of unique primes; -1 if it has an odd number; and 0 if some prime appears more than once.) You should be able to use a modified sieve to rapidly compute μ(k).
That might turn out to be a bit faster.

Related

Idea of dynamic programming solution

Given natural number N (1 <= N <= 2000), count the number of sets of natural numbers with the sum equal to N, if we know that ratio of any two elements in given set is more than 2
(for any x, y in given set: max(x, y) / min(x, y) >= 2)
I am trying to use given ratio so it would be possible to count the sum using geometry progression formula, but I haven't succeeded yet. Somehow it's necessary to come up with dynamic programming solution, but I have no idea how to come up with a formula
As Stef suggested in the comments, if you count the number of ways you can make n, using numbers that are at most k, you can calculate this using dynamic programming. For a given n, k, either you use k or you don't: if you do, then you have n-k left, and can use numbers <= k/2, and if you don't, then you still have n, and can use numbers <= k-1. It's very similar to a coin change algorithm, or to a standard algorithm for counting partitions.
With that, here's a program that prints out the values up to n=2000 in the sequence:
N = 2000
A = [[0] * (i+1) for i in range(N+1)]
A[0][0] = 1
for n in range(1, N+1):
for k in range((n+1)//2, n+1):
A[n][k] = A[n-k][min(n-k, k//2)] + A[n][k-1]
for i in range(N+1):
print(i, A[i][i])
It has a couple of optimizations: A[n, k] is the same as A[n, n] for k>n, and A[n, k]=0 when 2k+1 < N (because if you use k, then the largest integer you can get is at most k+k/2+k/4+... <= 2k-1 -- the infinite sum is 2k, but with integer arithmetic you can never achieve this). These two optimizations give a speedup factor of 2 each, compared to computing the whole (n+1)x(n+1) table.
With these two optimizations, and the array-based bottom-up dynamic programming approach, this prints out all the solutions in around 0.5s on my machine.

Efficiently generate primes in Python and calculate complexity

Generating prime numbers from 1 to n Python 3. How to improve efficiency and what is the complexity?
Input: A number, max (a large number)
Output: All the primes from 1 to max
Output is in the form of a list and will be [2,3,5,7,11,13,.......]
The code attempts to perform this task in an efficient way (least time complexity).
from math import sqrt
max = (10**6)*3
print("\nThis code prints all primes till: " , max , "\n")
list_primes=[2]
def am_i_prime(num):
"""
Input/Parameter the function takes: An integer number
Output: returns True, if the number is prime and False if not
"""
decision=True
i=0
while(list_primes[i] <= sqrt(num)): #Till sqrt(n) to save comparisons
if(num%list_primes[i]==0):
decision=False
break
#break is inserted so that we get out of comparisons faster
#Eg. for 1568, we should break from the loop as soon as we know that 1568%2==0
i+=1
return decision
for i in range(3,max,2): #starts from 3 as our list contains 2 from the beginning
if am_i_prime(i)==True:
list_primes.append(i) #if a number is found to be prime, we append it to our list of primes
print(list_primes)
How can I make this faster? Where can I improve?
What is the time complexity of this code? Which steps are inefficient?
In what ways is the Sieve of Eratosthenes more efficient than this?
Working for the first few iterations:-
We have a list_primes which contains prime numbers. It initially contains only 2.
We go to the next number, 3. Is 3 divisible by any of the numbers in list_primes? No! We append 3 to list_primes. Right now, list_primes=[2,3]
We go to the next number 4. Is 4 divisible by any of the numbers in list_primes? Yes (4 is divisible by 2). So, we don't do anything. Right now list_primes=[2,3]
We go to the next number, 5. Is 5 divisible by any of the numbers in list_primes? No! We append 5 to list_primes. Right now, list_primes=[2,3,5]
We go to the next number, 6. Is 6 divisible by any of the numbers in list_primes? Yes (6 is divisible by 2 and also divisible by 3). So, we don't do anything. Right now list_primes=[2,3,5]
And so on...
Interestingly, it takes a rather deep mathematical theorem to prove that your algorithm is correct at all. The theorem is: "For every n ≥ 2, there is a prime number between n and n^2". I know it has been proven, and much stricter bounds are proven, but I must admit I wouldn't know how to prove it myself. And if this theorem is not correct, then the loop in am_i_prime can go past the bounds of the array.
The number of primes ≤ k is O (k / log k) - this is again a very deep mathematical theorem. Again, beyond me to prove.
But anyway, there are about n / log n primes up to n, and for these primes the loop will iterate through all primes up to n^(1/2), and there are O (n^(1/2) / log n) of them.
So for the primes alone, the runtime is therefore O (n^1.5 / log^2 n), so that is a lower bound. With some effort it should be possible to prove that for all numbers, the runtime is asymptotically the same.
O (n^1.5 / log n) is obviously an upper bound, but experimentally the number of divisions to find all primes ≤ n seems to be ≤ 2 n^1.5 / log^2 n, where log is the natural logarithm.
The following rearrangement and optimization of your code will reach your maximum in nearly 1/2 the time of your original code. It combines your top level loop and predicate function into a single function to eliminate overhead and manages squares (square roots) more efficiently:
def get_primes(maximum):
primes = []
if maximum > 1:
primes.append(2)
squares = [4]
for number in range(3, maximum, 2):
i = 0
while squares[i] <= number:
if number % primes[i] == 0:
break
i += 1
else: # no break
primes.append(number)
squares.append(number * number)
return primes
maximum = 10 ** 6 * 3
print(get_primes(maximum))
However, a sieve-based algorithm will easily beat this (as it avoids division and/or multiplication). Your code has a bug: setting max = 1 will create the list [2] instead of the correct answer of an empty list. Always test both ends of your limits.
O(N**2)
Approximately speaking, the first call to am_I_prime does 1 comparison, the second does 2, ..., so the total count is 1 + 2 + ... + N, which is (N * (N-1)) / 2, which has order N-squared.

Count "cool" divisors of given number N

I'm trying to solve pretty complex problem with divisors and number theory.
Namely for a given number m we can say that k is cool divisor if k<m k|m (k divides m evenly), and for a given number n the number k^n (k to the power of n) is not divisor of m. Let s(x) - number of cool divisors of x.
Now for given a and b we should find D = s(a) + s(a+1) + s(a+2) + s(a+3) + ... + s(a+b).
Limits for all values:
(1 <= a <= 10^6), (1 <= b <= 10^7), (2<=n<=10)
Example
Let's say a=32, b=1, n=3;
x = 32, n = 3 divisors of 32 are {1,2,4,8,16,32}. However only {4,8,16} fill the conditions so s(32) = 3
x = 33, n = 3 divisors of 33 are {1,3,11,33}. Only the numbers {3,11} fill the conditions so s(33)=2;
D = s(32) + s(33) = 3 + 2 = 5
What I have tried
We should answer all those questions for 100 test cases in 3 seconds time limit.
I have two ideas, the first one: I iterate in the interval [a, a+b] and for each value i in the range I check how many cool divisors are there for that value, we can check this in O(sqrt(N)) if the function for getting number of power of N is considered as O(1) so the total function for this is O(B*sqrt(B)).
The second one, I'm now sure if it will work and how fast it will be. First I do a precomputation, I have a for loop that iterates from 1 to N, where N = 10^7
and now in the range [2, N] for each number whose divisor is i, where i is in the range [2,N] and I check if i to the power of n is not divisor of j then we update that the number j has one more cool divisor. With this I think that the complexity will be O(NlogN) and for the answers O(B).
Your first idea works but you can improve it.
Instead of checking all numbers from 1 to sqrt(N) whether they are cool divisors, you can factorize N=*p0^q0*p1^q1*p2^q2...pk^qk*. Then the number of cool divisors should then be (q0+1)(q1+1)...(qk+1) - (q0/n+1)(q1/n+1)...(qk/n+1).
So you can first preprocess and find out all the prime numbers using some existing algo like Sieve of Eratosthenes and for each number N between [a,a+b] you do a factorization. The complexity should be roughly O(BlogB).
Your second idea works as well.
For each number i between [2,a+b], you can just check the multiples of i between [a,a+b] and see whether i is a cool divisor of those multiples. The complexity should be O(BlogB) as well. Some tricks can be played in this idea to speed up the program is that, once you don't need to use divide/mod operations from time to time to check whether i is a cool divisor. You can compute the first number m between [a, a+b] that i^n|m. This m should be m=ceiling(a/(i^n))(i^n). And then you know i^n|m+p*i does not hold for p between [1,i^(n-1) - 1] and holds for p=i^n-1. Basically, you know i is not a cool divisor every i^(n-1) multiples, and you do not need to use divide/mod to figure it out, which will speed the program up.

Smart algorithm for finding the divisors of a binomial coefficient

I'm interested in tips for my algorithm that I use to find out the divisors of a very large number, more specifically "n over k" or C(n, k). The number itself can range very high, so it really needs to take time complexity into the 'equation' so to say.
The formula for n over k is n! / (k!(n-k)!) and I understand that I must try to exploit the fact that factorials are kind of 'recursive' somehow - but I havent yet read too much discrete mathematics so the problem is both of a mathematical and a programming nature.
I guess what I'm really looking for are just some tips heading me in the right direction - I'm really stuck.
First you could start with the fact that : C(n,k) = (n/k) C(n-1,k-1).
You can prouve that C(n,k) is divisible by n/gcd(n,k).
If n is prime then n divides C(n,k).
Check Kummer's theorem: if p is a prime number, n a positive number, and k a positive number with 0< k < n then the greatest exponent r for which p^r divides C(n,k) is the number of carries needed in the subtraction n-k in base p.
Let us suppose that n>4 :
if p>n then p cannot divide C(n,k) because in base p, n and k are only one digit wide → no carry in the subtraction
so we have to check for prime divisors in [2;n]. As C(n,k)=C(n,n-k) we can suppose k≤n/2 and n/2≤n-k≤n
for the prime divisors in the range [n/2;n] we have n/2 < p≤n, or equivalently p≤n<2p. We have p≥2 so p≤n < p² which implies that n has exactly 2 digits when written in base p and the first digit has to be 1. As k≤n/2 < p, k can only be one digit wide. Either the subtraction as one carry and one only when n-k< p ⇒ p divides C(n,k); either the subtraction has no carry and p does not divide C(n,k).
The first result is :
every prime number in [n-k;n] is a prime divisor of C(n,k) with exponent 1.
no prime number in [n/2;n-k] is a prime divisor of C(n,k).
in [sqrt(n); n/2] we have 2p≤n< p², n is exactly 2 digits wide in base p, k< n implies k has at most 2 digits. Two cases: only one carry on no carry at all. A carry exists only if the last digit of n is greater than the last digit of p iif n modulo p < k modulo p
The second result is :
For every prime number p in [sqrt(n);n/2]
p divides C(n;k) with exponent 1 iff n mod p < k mod p
p does not divide C(n;k) iff n mod p ≥ k mod p
in the range [2; sqrt(n)] we have to check all the prime numbers. It's only in this range that a prime divisor will have an exponent greater than 1
There will be an awful lot of divisors. If you only need prime divisors, then this is easy for each prime: the exponent of p in n! is [n/p] + [n/p^2] + [n/p^3] + ... where [x] denotes the integer part of x. There will be only a finite number of non-zero terms. You can reuse the result of [n/p^t] to calculate [n/p^(t+1)].
As an example, how many zeroes are at the end of 2022!? Let's find the number of times 5 divides 2022!
[2022/5] = 404
[404/5] = 80
[80/5] = 16
[16/5] = 3
[3/5] = 0
So 5 divides 2022! 404+80+16+3=503 times, and 2 obviously more times than that, so 10=5*2 does it 503 times. Check.
So in order to find how many times a prime p divides a binomial coefficient C(n,k), di the above calculation for n, k and n-k separately, and subtract:
the exponent of p in C(n,k) = (the exponent of p in n!) -
(the exponent of p in k!) -
(the exponent of p in (n-k)!)
Now repeat this for all prime numbers less or equal than sqrt(n), and you are done.
This is essentially the same calculation as in the other answer but without computing numbers base p explicitly, so it could be a bit easier to perform in practice.

Calculating sum of geometric series (mod m)

I have a series
S = i^(m) + i^(2m) + ............... + i^(km) (mod m)
0 <= i < m, k may be very large (up to 100,000,000), m <= 300000
I want to find the sum. I cannot apply the Geometric Progression (GP) formula because then result will have denominator and then I will have to find modular inverse which may not exist (if the denominator and m are not coprime).
So I made an alternate algorithm making an assumption that these powers will make a cycle of length much smaller than k (because it is a modular equation and so I would obtain something like 2,7,9,1,2,7,9,1....) and that cycle will repeat in the above series. So instead of iterating from 0 to k, I would just find the sum of numbers in a cycle and then calculate the number of cycles in the above series and multiply them. So I first found i^m (mod m) and then multiplied this number again and again taking modulo at each step until I reached the first element again.
But when I actually coded the algorithm, for some values of i, I got cycles which were of very large size. And hence took a large amount of time before terminating and hence my assumption is incorrect.
So is there any other pattern we can find out? (Basically I don't want to iterate over k.)
So please give me an idea of an efficient algorithm to find the sum.
This is the algorithm for a similar problem I encountered
You probably know that one can calculate the power of a number in logarithmic time. You can also do so for calculating the sum of the geometric series. Since it holds that
1 + a + a^2 + ... + a^(2*n+1) = (1 + a) * (1 + (a^2) + (a^2)^2 + ... + (a^2)^n),
you can recursively calculate the geometric series on the right hand to get the result.
This way you do not need division, so you can take the remainder of the sum (and of intermediate results) modulo any number you want.
As you've noted, doing the calculation for an arbitrary modulus m is difficult because many values might not have a multiplicative inverse mod m. However, if you can solve it for a carefully selected set of alternate moduli, you can combine them to obtain a solution mod m.
Factor m into p_1, p_2, p_3 ... p_n such that each p_i is a power of a distinct prime
Since each p is a distinct prime power, they are pairwise coprime. If we can calculate the sum of the series with respect to each modulus p_i, we can use the Chinese Remainder Theorem to reassemble them into a solution mod m.
For each prime power modulus, there are two trivial special cases:
If i^m is congruent to 0 mod p_i, the sum is trivially 0.
If i^m is congruent to 1 mod p_i, then the sum is congruent to k mod p_i.
For other values, one can apply the usual formula for the sum of a geometric sequence:
S = sum(j=0 to k, (i^m)^j) = ((i^m)^(k+1) - 1) / (i^m - 1)
TODO: Prove that (i^m - 1) is coprime to p_i or find an alternate solution for when they have a nontrivial GCD. Hopefully the fact that p_i is a prime power and also a divisor of m will be of some use... If p_i is a divisor of i. the condition holds. If p_i is prime (as opposed to a prime power), then either the special case i^m = 1 applies, or (i^m - 1) has a multiplicative inverse.
If the geometric sum formula isn't usable for some p_i, you could rearrange the calculation so you only need to iterate from 1 to p_i instead of 1 to k, taking advantage of the fact that the terms repeat with a period no longer than p_i.
(Since your series doesn't contain a j=0 term, the value you want is actually S-1.)
This yields a set of congruences mod p_i, which satisfy the requirements of the CRT.
The procedure for combining them into a solution mod m is described in the above link, so I won't repeat it here.
This can be done via the method of repeated squaring, which is O(log(k)) time, or O(log(k)log(m)) time, if you consider m a variable.
In general, a[n]=1+b+b^2+... b^(n-1) mod m can be computed by noting that:
a[j+k]==b^{j}a[k]+a[j]
a[2n]==(b^n+1)a[n]
The second just being the corollary for the first.
In your case, b=i^m can be computed in O(log m) time.
The following Python code implements this:
def geometric(n,b,m):
T=1
e=b%m
total = 0
while n>0:
if n&1==1:
total = (e*total + T)%m
T = ((e+1)*T)%m
e = (e*e)%m
n = n/2
//print '{} {} {}'.format(total,T,e)
return total
This bit of magic has a mathematical reason - the operation on pairs defined as
(a,r)#(b,s)=(ab,as+r)
is associative, and the rule 1 basically means that:
(b,1)#(b,1)#... n times ... #(b,1)=(b^n,1+b+b^2+...+b^(n-1))
Repeated squaring always works when operations are associative. In this case, the # operator is O(log(m)) time, so repeated squaring takes O(log(n)log(m)).
One way to look at this is that the matrix exponentiation:
[[b,1],[0,1]]^n == [[b^n,1+b+...+b^(n-1))],[0,1]]
You can use a similar method to compute (a^n-b^n)/(a-b) modulo m because matrix exponentiation gives:
[[b,1],[0,a]]^n == [[b^n,a^(n-1)+a^(n-2)b+...+ab^(n-2)+b^(n-1)],[0,a^n]]
Based on the approach of #braindoper a complete algorithm which calculates
1 + a + a^2 + ... +a^n mod m
looks like this in Mathematica:
geometricSeriesMod[a_, n_, m_] :=
Module[ {q = a, exp = n, factor = 1, sum = 0, temp},
While[And[exp > 0, q != 0],
If[EvenQ[exp],
temp = Mod[factor*PowerMod[q, exp, m], m];
sum = Mod[sum + temp, m];
exp--];
factor = Mod[Mod[1 + q, m]*factor, m];
q = Mod[q*q, m];
exp = Floor[ exp /2];
];
Return [Mod[sum + factor, m]]
]
Parameters:
a is the "ratio" of the series. It can be any integer (including zero and negative values).
n is the highest exponent of the series. Allowed are integers >= 0.
mis the integer modulus != 0
Note: The algorithm performs a Mod operation after every arithmetic operation. This is essential, if you transcribe this algorithm to a language with a limited word length for integers.

Resources