Given a number N, find the number of ways to write it as a sum of two or more consecutive integers - algorithm

Here is the problem that tagged as dynamic-programming (Given a number N, find the number of ways to write it as a sum of two or more consecutive integers) and example 15 = 7+8, 1+2+3+4+5, 4+5+6
I solved with math like that :
a + (a + 1) + (a + 2) + (a + 3) + ... + (a + k) = N
(k + 1)*a + (1 + 2 + 3 + ... + k) = N
(k + 1)a + k(k+1)/2 = N
(k + 1)*(2*a + k)/2 = N
Then check that if N divisible by (k+1) and (2*a+k) then I can find answer in O(sqrt(N)) time
Here is my question how can you solve this by dynamic-programming ? and what is the complexity (O) ?
P.S : excuse me, if it is a duplicate question. I searched but I can find

The accepted answer was great but the better approach wasn't clearly presented. Posting my java code as below for reference. It might be quite verbose, but explains the idea more clearly. This assumes that the consecutive integers are all positive.
private static int count(int n) {
int i = 1, j = 1, count = 0, sum = 1;
while (j<n) {
if (sum == n) { // matched, move sub-array section forward by 1
sum -= i;
sum +=j;
} else if (sum < n) { // not matched yet, extend sub-array at end
sum += j;
} else { // exceeded, reduce sub-array at start
sum -= i;
return count;

We can use dynamic programming to calculate the sums of 1+2+3+...+K for all K up to N. sum[i] below represents the sum 1+2+3+...+i.
sum = [0]
for i in 1..N:
append sum[i-1] + i to sum
With these sums we can quickly find all sequences of consecutive integers summing to N. The sum i+(i+1)+(i+2)+...j is equal to sum[j] - sum[i] + 1. If the sum is less than N, we increment j. If the sum is greater than N, we increment i. If the sum is equal to N, we increment our counter and both i and j.
i = 0
j = 0
count = 0
while j <= N:
cur_sum = sum[j] - sum[i] + 1
if cur_sum == N:
if cur_sum <= N:
if cur_sum >= N:
There are better alternatives than using this dynamic programming solution though. The sum array can be calculated mathematically using the formula k(k+1)/2, so we could calculate it on-the-fly without need for the additional storage. Even better though, since we only ever shift the end-points of the sum we're working with by at most 1 in each iteration, we can calculate it even more efficiently on the fly by adding/subtracting the added/removed values.
i = 0
j = 0
sum = 0
count = 0
while j <= N:
cur_sum = sum[j] - sum[i] + 1
if cur_sum == N:
if cur_sum <= N:
sum += j
if cur_sum >= N:
sum -= i

For odd N, this problem is equivalent to finding the number of divisors of N not exceeding sqrt(N). (For even N, there is a couple of twists.) That task takes O(sqrt(N)/ln(N)) if you have access to a list of primes, O(sqrt(N)) otherwise.
I don't see how dynamic programming can help here.

In order to solve the problem we will try all sums of consecutive integers in [1, M], where M is derived from M(M+1)/2 = count = 0
for i in [1,M]
for j in [i, M]
s = sum(i, j) // s = i + (i+1) + ... + (j-1) + j
if s == N
if s >= N
return count
Since we do not want to calculate sum(i, j) in every iteration from scratch we'll use a technique known as "memoization". Let's create a matrix of integers sum[M+1][M+1] and set sum[i][j] to i + (i+1) + ... + (j-1) + j.for i in [1, M]
sum[i][i] = i
int count = 0
for i in [1, M]
for j in [i + 1, M]
sum[i][j] = sum[i][j-1] + j
if sum[i][j] == N
if sum[i][j] >= N
return count
The complexity is obviously O(M^2), i.e. O(N)

1) For n >= 0 an integer, the sum of integers from 0 to n is n*(n+1)/2. This is classic : write this sum first like this :
S = 0 + 1 + ... + n
and then like this :
S = n + (n-1) + ... + 0
You see that 2*S is equal to (0+n) + (1 + n-1)) + ... + (n+0) = (n+1)n, so that S = n(n+1)/2 indeed. (Well known but is prefered my answer to be self contained).
2) From 1, if we note cons(m,n) the sum m+(m+1)+...(n-1)+n the consecutive sum of integers between posiive (that is >=0) such that 1<=m<=n we see that :
cons(m,n) = (0+1+...+n) - (0+1+...+(m-1)) which gives from 1 :
cons(m,n) = n*(n+1)/ - m(m-1)/2
3) The question is then recasted into the following : in how many ways can we write N in the form N = cons(m,n) with m,n integers such that 1<=m<=n ? If we have N = cons(m,n), this is equivalent to m^2 - m + (2N -n^2 -n) = 0, that is, the real polynomial T^2 - m + (2N -n^2 -n) has a real root, m : its discriminant delta must then be a square. But we have :
delta = 1 - 3*(2*N - n^2 - n)
And this delta is an integer which must be a square. There exists therefore an integer M such that :
delta = 1 - 3*(2*N - n^2 - n) = M^2
that is
M^2 = 1 - 6*N + n(n+1)
n(n+1) is always dividible by 2 (it's for instance 2 times our S from the beginning, but here is a more trivial reason, among to consecutive integers, one must be even) and therefore M^2 is odd, implying that M must be odd.
4) Rewrite or previous equation as :
n^2 + n + (1-6*N - M^2) = 0
This show that the real polynomial X^2 + X + (1-6*N - M^2) has a real zero, n : its discriminant gamma must therefore be a square, but :
gamma = 1 - 4*(1-6*N-M^2)
and this must be a square, so that here again, there exist an integer G such that
G^2 = 1 - 4*(1-6*N-M^2)
G^2 = 1 + 4*(2*N + m*(m-1))
which shows that, as M is odd, G is odd also.
5) Substracting M^2 = 1 - 4*(2*N - n*(n+1)) to G^2 = 1 + 4*(2*N + m*(m-1))) yields to :
G^2 - M^2 = 4*(2*N + m*(m-1)) + 4*(2*N -n*(n+1))
= 16*N + 4*( m*(m-1) - n*(n+1) )
= 16*N - 8*N (because N = cons(m,n))
= 8*N
And finally this can be rewritten as :
(G-M)*(G+M) = 8*N, that is
[(G-M)/2]*[(G+M)/2] = 2*N
where (G-M)/2 and (G+M)/2 are integers (G-M and G+M are even since G and M are odd)
6) Thus, at each manner to write N as cons(m,n), we can associate a way (and only one way, as M and G are uniquely determined) to factor 2*N into the product x*y, with x = (G-M)/2 and y = (G+M)/2 where G and M are two odd integers. Since G = x + y and M = -x + y, as G and M are odd, we see that x and y should have opposite parities. Thus among x and y, one is even and the other is odd. Thus 2*N = x*y where among x and y, one is even and the other is odd. Lets c be the odd one among x and y, and d be the even one. Then 2*N = c*d, thus N = c*(d/2). So c is and odd number dividing N, and is uniquely determined by N, as soon as N = cons(m,n). Reciprocally, as soon as N has an odd divisor, one can reverse engineer all this stuff to find n and m.
7) *Conclusion : there exist a one to one correspondance between the number of ways of writing N = cons(m,n) (which is the number of ways of writing N as sum of consecutive integers, as we have seen) and the number of odd divisors of N.*
8) Finally, the number we are looking for is the number of odd divisors of n. I guess that solving this one by DP or whatever is easier than solving the previous one.

When you think it upside down (Swift)...
func cal(num : Int) -> Int {
let halfVal = Double(Double(num)/2.0).rounded(.up)
let endval = Int((halfVal/2).rounded(.down))
let halfInt : Int = Int(halfVal)
for obj in (endval...halfInt).reversed() {
var sum : Int = 0
for subVal in (1...obj).reversed() {
sum = sum + subVal
if sum > num {
if sum == num {
noInt += 1
return noInt


Having a difficult time figuring out how to find running time in Theta notation of a nested for loop

I have a triple nested for loop, that needs to be expressed in Theta for running time (complexity).
I ended up with Theta(n^3) but not 100% sure if my reasoning is correct.
x = 0
for i = 1 to n do
for j = 1 to i do
for k = j to i + j do
x <- x + 1
return x
You can solve this step by step from the inner to the outermost loop:
x = 0
for i = 1 to n do
for j = 1 to i do
for k = j to i + j do
x <- x + 1
return x
Start from the inner-most statement. x <- x + 1 is a single instruction, so
x = 0
for i = 1 to n do
for j = 1 to i do
for k = j to i + j do
return x
The next loop runs (i + j) - j + 1 = i + 1 times:
x = 0
for i = 1 to n do
for j = 1 to i do
return x
Proceeding in this way, we get:
x = 0
for i = 1 to n do
return x
and finally, using the fact that the sum of squares from 1 to n is n(n + 1)(2n + 1)/6 = n^3/3 + n^2/2 + n/6 = Theta(n^3):
x = 0
return x
Since all remaining statements run in constant time, the overall complexity is Theta(n^3).
Let's see. Assuming that n is very large (otherwise complexity would not be an issue). Your outer loop has n steps. The intermediary loop has roughly n / 2 steps.
To understand the inner loop, let's think about its boundaries. The end is i + j and the start is j, so the number of elements is i + j - j + 1, which is i + 1. This can be very very small if i = 1 or very very large, if i is practically infinite (n).
Infinity has a property that
Infinity / 2 = Infinity
So, even n / 2 would be not finite if we assume infinity as the value of n. So, your result is correct, we have Theta(n^3), of course, we do not really care about adding some constant value or multiplying n with a finite strictly positive number.

Algorithms - are double ended selection sorts really faster than single ended ones?

A double ended selection sort, one that swaps both min and max, is claimed to be faster to be an ordinary selection sort, even thought the number of comparisons is the same. I understand that it gets rid of some of the looping, but if the number of comparisons stay the same, how are they faster?
Thanks in advance
Here's implementations of selection sort and double ended selection sort that count comparisons performed.
If you run it, you'll see that double-ended selection sort always performs more comparisons than regular selection sort.
import random
def selsort(xs):
N = len(xs)
comparisons = 0
for i in xrange(N):
m = i
for j in xrange(i+1, N):
comparisons += 1
if xs[j] < xs[m]: m = j
xs[i], xs[m] = xs[m], xs[i]
return comparisons
def deselsort(xs):
N = len(xs)
comparisons = 0
for i in xrange(N//2):
M = m = i
for j in xrange(i+1, N-i):
comparisons += 2
if xs[j] < xs[m]: m = j
if xs[j] >= xs[M]: M = j
xs[i], xs[m] = xs[m], xs[i]
if M == i: M = m
xs[N-i-1], xs[M] = xs[M], xs[N-i-1]
return comparisons
for rr in xrange(1, 30):
xs = range(rr)
xs0 = xs[:]
xs1 = xs[:]
print len(xs), selsort(xs0), deselsort(xs1)
assert xs0 == sorted(xs0), xs0
assert xs1 == sorted(xs1), xs1
That's because the number of comparisons for regular selection sort is:
(n-1) + (n-2) + ... + 1 = n(n-1)/2
For double-ended selection sort, the number of comparisons is (for odd n -- the even case is similar)
2(n-1) + 2(n-3) + 2(n-5) + ... + 2
= (n-1)+(n-2)+1 + (n-3)+(n-4)+1 + ... 2+1+1
= ((n-1) + (n-2) + ... + 1) + (n-1)/2
= n(n-1)/2 + (n-1)/2
(Here, I'm rewriting each term 2(n-i) as (n-i) + (n-i-1) + 1)

Time complexity of the algorithm?

This is the algorithm: I think its time complexity is O(n^2) because of loop in loop. How can I explain that?
FindSum(array, n, t)
i := 0
found := 0
array := quick_sort(array, 0, n - 1)
while i < n – 2
j = i + 1
k = n - 1
while k > j
sum = array[i] + array[j] + array[k]
if sum == t
found += 1
k -= 1
j += 1
else if sum > t
k -= 1
j += 1
Yes, the complexity is indeed O(n^2).
The inner loops runs anywhere between k-j = n-1-(i+1) = n-i-2 to (k-j)/2 = (n-i-2)/2 iterations.
Summing it up for all possible values of i from 0 to n-2 gives you:
T = n-0-2 + n-1-2 + n-2-2 + ... + n-(n-2)-2
= n-2 + n-3 + ... + 0
This is sum of arithmetic progression, that sums in (n-1)(n-2)/2 (sum of arithmetic progression), which is quadric. Note that dividing by extra 2 (for "best" case of inner loop) does not change time complexity in terms of big O notation.

Calculating the number of times an if statement is executed

This code counts how many integer triples sum to 0: The full code is here.
initialise an int array of length n
int cnt = 0 // cnt is the number of triples that sum to 0
for (int i = 0; i < n; i++) {
for (int j = i+1; j < n; j++) {
for (int k = j+1; k < n; k++) {
if (array[i]+array[j]+array[k] == 0) {
Now, from the book Algorithms by Robert Sedgewick, I read that:
The initialisation of cnt to 0 is executed exactly once.
cnt++ is executed from 0 to the number of times a triple is found.
The if statement is executed n(n-1)(n-2)/6 times.
I've done some experiments and all of them are true. But I completely don't know how they calculate the number of times the if statement got executed.
I'm not sure, but I think that:
n means from i to n
(n-1) means from i+1 to n
(n-2) means from j+1 to n
/6 I don't know what's this for.
Can anyone explain how to calculate this?
It's sum of sums.
The inner loop is executed n-j-1 times each time it is being reached
The middle loop is executed n-i-1 times each time it is being reached
The outer loop is executed n times.
Sum all of these and you get total number of times the cnt++ is invoked.
Note that the number of times the middle loop is executed each time is NOT n-1, it is n-i-1, where i is the index of the outer loop. Similarly for middle loop.
The /6 factor is coming from taking it into account in the summation formula.
First loop executes for N times (0 to N-1)
Time to execute outer loop is:
Fi(0) + Fi(1) + Fi(2)...Fi(N-1)
When i is 0, middle loop executes N-1 times (1 to N-1)
When i is 1, middle loop executes N-2 times (2 to N-1)
Time to execute middle loop is:
Fi(0) = Fj(1) + Fj(2) ... Fj(N-1)
Fi(1) = Fj(2) + Fj(3) ... Fj(N-1)
Fi(0) + Fi(1) + Fi(2)...Fi(N-1) = Fj(1) + 2Fj(2) + ... (N-1)Fj(N-1)
Now come to the inner most loop:
When j is 1, inner loop executes N-2 times (2 to N-2)
When j is 2, inner loop executes N-3 times (3 to N-2)
Fj(1) = Fk(2) + Fk(3) ... Fk(N-1) = 2 + 3 + ... N-1
Fj(2) = Fk(3) + Fk(4) ... Fk(N-1) = 3 + 4 + ... N-1
Fj(1) + 2Fj(2) + ... (N-1)Fj(N-1) = (2 + 3 + ... N-1) + (3 + 4 + ... N-1) ... (N-1)
= 1 x 2 + 2 x 3 + 3 x 4 .... (N-2) x (N-1)
= 1x1 + 2x2 + 3x3 + 4x4 .... (N-1)*(N-1) - (1 + 2 + 3 + 4 + N-1)
= (N-1) N (N+1) / 6 - N (N-1) / 2
= N (N-1) ((N+1)/2 - 1/2)
= N (N-1) (N-2) / 6
You may want to also check: Formula to calculate the sum of squares of first N natural numbers and sum of first N natural numbers.
Alternate explanation:
You are finding all pairs of triplets. This can be done in NC3 ways. i.e. (N) * (N-1) * (N-2) / (1 * 2 * 3) ways.
This can be viewed as a combinatorial problem. To pick 3 unique items from n items (k=3 in the linked article) gives n!/(n-3)! = n*(n-1)*(n-2) possibilities. However, in the code the order of the 3 items doesn't matter. For each combination of 3 items, there are 3! = 6 permutations. So we need to divide by 6 to get only orderless possibilities. So we get n!/(3!(n-3)!) = n(n-1)(n-2)/6
The basis of this formula comes from the sum of a progression:
1+2 = 3
1+2+3 = 6
1+2+3+4 = 10
There exists the Formula:
Sum(1..N) == N*(N+1)/2
1+2+3+4 = 4*5/2 = 10
With a recursive progression (like in this case) you get another formula for the sums.
In your code, where i runs from 0 to n, j from i to n, k from j to n, the if statement is executed about n^3 / 6 times. To see why that is so, look at this code which will obviously execute the if statement just as often:
int cnt = 0 // cnt is the number of triples that sum to 0
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
if (j > i && k > j) {
if (array[i]+array[j]+array[k] == 0) {
The inner loop now obviously executes n^3 times. The if statement is executed if i < j < k. We ignore the case that i == j or i == k or j == k. The three variables i, j and k could be sorted in six different orders (i < j < k, i < k < j, j < i < k etc.). Since each of these six different sorting orders happens equally often, about n^3 / 6 times we have the order i < j < k.

Any faster algorithm to compute the number of divisors

The F series is defined as
F(0) = 1
F(1) = 1
F(i) = i * F(i - 1) * F(i - 2) for i > 1
The task is to find the number of different divisors for F(i)
This question is from Timus . I tried the following Python but it surely gives a time limit exceeded. This bruteforce approach will not work for a large input since it will cause integer overflow as well.
#!/usr/bin/env python
from math import sqrt
n = int(raw_input())
def f(n):
global arr
if n == 0:
return 1
if n == 1:
return 1
a = 1
b = 1
for i in xrange(2, n + 1):
k = i * a * b
a = b
b = k
return b
x = f(n)
cnt = 0
for i in xrange(1, int(sqrt(x)) + 1):
if x % i == 0:
if x / i == i:
cnt += 1
cnt += 2
print cnt
Any optimization?
I have tried the suggestion, and rewrite the solution: (not storing the F(n) value directly, but a list of factors)
#!/usr/bin/env python
#from math import sqrt
T = 10000
primes = range(T)
primes[0] = False
primes[1] = False
primes[2] = True
primes[3] = True
for i in xrange(T):
if primes[i]:
j = i + i
while j < T:
primes[j] = False
j += i
p = []
for i in xrange(T):
if primes[i]:
n = int(raw_input())
def f(n):
global p
if n == 1:
return 1
a = dict()
b = dict()
for i in xrange(2, n + 1):
c = a.copy()
for y in b.iterkeys():
if c.has_key(y):
c[y] += b[y]
c[y] = b[y]
k = i
for y in p:
d = 0
if k % y == 0:
while k % y == 0:
k /= y
d += 1
if c.has_key(y):
c[y] += d
c[y] = d
if k < y: break
a = b
b = c
k = 1
for i in b.iterkeys():
k = k * (b[i] + 1) % (1000000007)
return k
print f(n)
And it still gives TL5, not faster enough, but this solves the problem of overflow for value F(n).
First see this wikipedia article on the divisor function. In short, if you have a number and you know its prime factors, you can easily calculate the number of divisors (get SO to do TeX math):
$n = \prod_{i=1}^r p_i^{a_i}$
$\sigma_x(n) = \prod_{i=1}^{r} \frac{p_{i}^{(a_{i}+1)x}-1}{p_{i}^x-1}$
Anyway, it's a simple function.
Now, to solve your problem, instead of keeping F(n) as the number itself, keep it as a set of prime factors and exponent sizes. Then the function that calculates F(n) simply takes the two sets for F(n-1) and F(n-2), sums the exponents of the same prime factors in both sets (assuming zero for nonexistent ones) and additionally adds the set of prime factors and exponent sizes for the number i. This means that you need another simple1 function to find the prime factors of i.
Computing F(n) this way, you just need to apply the above formula (taken from Wikipedia) to the set and there's your value. Note also that F(n) can quickly get very large. This solution also avoids usage of big-num libraries (since no prime factor nor its exponent is likely to go beyond 4 billion2).
1 Of course this is not so simple for arbitrarily large i, otherwise we wouldn't have any form of security right now, but for your application it should be simple enough.
2 Well it might. If you happen to figure out a simple formula answering your question given any n, then large ns would also be possible in the test case, for which this algorithm is likely going to give a time limit exceeded.
That is a fun problem.
The F(n) grow extremely fast. Since F(n) <= F(n+1) for all n, we have
F(n+2) > F(n)²
for all n, and thus
F(n) > 2^(2^(n/2-1))
for n > 2. That crude estimate already shows that one cannot store these numbers for any but the smallest n. By that F(100) requires more than (2^49) bits of storage, and 128 GB are only 2^40 bits. Actually, the prime factorisation of F(100) is
*Fiborial> fiborials !! 100
and that would require about 9.6 * 10^20 (roughly 2^70) bits - a little less than half of them are trailing zeros, but even storing the numbers à la floating point numbers with a significand and an exponent doesn't bring the required storage down far enough.
So instead of storing the numbers themselves, one can consider the prime factorisation. That also allows an easier computation of the number of divisors, since
k k
divisors(n) = ∏ (e_i + 1) if n = ∏ p_i^e_i
i=1 i=1
Now, let us investigate the prime factorisations of the F(n) a little. We begin with the
Lemma: A prime p divides F(n) if and only if p <= n.
That is easily proved by induction: F(0) = F(1) = 1 is not divisible by any prime, and there are no primes <= 1.
Now suppose that n > 1 and
A(k) = The prime factors of F(k) are exactly the primes <= k
holds for k < n. Then, since
F(n) = n * F(n-1) * F(n-2)
the set prime factors of F(n) is the union of the sets of prime factors of n, F(n-1) and F(n-2).
By the induction hypothesis, the set of prime factors of F(k) is
P(k) = { p | 1 < p <= k, p prime }
for k < n. Now, if n is composite, all prime factors of n are samller than n, hence the set of prime factors of F(n) is P(n-1), but since n is not prime, P(n) = P(n-1). If, on the other hand, n is prime, the set of prime factors of F(n) is
P(n-1) ∪ {n} = P(n)
With that, let us see how much work it is to track the prime factorisation of F(n) at once, and update the list/dictionary for each n (I ignore the problem of finding the factorisation of n, that doesn't take long for the small n involved).
The entry for the prime p appears first for n = p, and is then updated for each further n, altogether it is created/updated N - p + 1 times for F(N). Thus there are
∑ (N + 1 - p) = π(N)*(N+1) - ∑ p ≈ N²/(2*log N)
p <= N p <= N
updates in total. For N = 10^6, about 3.6 * 10^10 updates, that is way more than can be done in the allowed time (0.5 seconds).
So we need a different approach. Let us look at one prime p alone, and follow the exponent of p in the F(n).
Let v_p(k) be the exponent of p in the prime factorisation of k. Then we have
v_p(F(n)) = v_p(n) + v_p(F(n-1)) + v_p(F(n-2))
and we know that v_p(F(k)) = 0 for k < p. So (assuming p is not too small to understand what goes on):
v_p(F(n)) = v_p(n) + v_p(F(n-1)) + v_p(F(n-2))
v_p(F(p)) = 1 + 0 + 0 = 1
v_p(F(p+1)) = 0 + 1 + 0 = 1
v_p(F(p+2)) = 0 + 1 + 1 = 2
v_p(F(p+3)) = 0 + 2 + 1 = 3
v_p(F(p+4)) = 0 + 3 + 2 = 5
v_p(F(p+5)) = 0 + 5 + 3 = 8
So we get Fibonacci numbers for the exponents, v_p(F(p+k)) = Fib(k+1) - for a while, since later multiples of p inject further powers of p,
v_p(F(2*p-1)) = 0 + Fib(p-1) + Fib(p-2) = Fib(p)
v_p(F(2*p)) = 1 + Fib(p) + Fib(p-1) = 1 + Fib(p+1)
v_p(F(2*p+1)) = 0 + (1 + Fib(p+1)) + Fib(p) = 1 + Fib(p+2)
v_p(F(2*p+2)) = 0 + (1 + Fib(p+2)) + (1 + Fib(p+1)) = 2 + Fib(p+3)
v_p(F(2*p+3)) = 0 + (2 + Fib(p+3)) + (1 + Fib(p+2)) = 3 + Fib(p+4)
but the additional powers from 2*p also follow a nice Fibonacci pattern, and we have v_p(F(2*p+k)) = Fib(p+k+1) + Fib(k+1) for 0 <= k < p.
For further multiples of p, we get another Fibonacci summand in the exponent, so
v_p(F(n)) = ∑ Fib(n + 1 - k*p)
-- until n >= p², because multiples of p² contribute two to the exponent, and the corresponding summand would have to be multiplied by 2; for multiples of p³, by 3 etc.
One can also split the contributions of multiples of higher powers of p, so one would get one Fibonacci summand due to it being a multiple of p, one for it being a multiple of p², one for being a multiple of p³ etc, that yields
n/p n/p² n/p³
v_p(F(n)) = ∑ Fib(n + 1 - k*p) + ∑ Fib(n + 1 - k*p²) + ∑ Fib(n + 1 - k*p³) + ...
k=1 k=1 k=1
Now, in particular for the smaller primes, these sums have a lot of terms, and computing them that way would be slow. Fortunately, there is a closed formula for sums of Fibonacci numbers whose indices are an arithmetic progression, for 0 < a <= s
∑ Fib(a + k*s) = (Fib(a + (m+1)*s) - (-1)^s * Fib(a + m*s) - (-1)^a * Fib(s - a) - Fib(a)) / D(s)
D(s) = Luc(s) - 1 - (-1)^s
and Luc(k) is the k-th Lucas number, Luc(k) = Fib(k+1) + Fib(k-1).
For our purposes, we only need the Fibonacci numbers modulo 10^9 + 7, then the division must be replaced by a multiplication with the modular inverse of D(s).
Using these facts, the number of divisors of F(n) modulo 10^9+7 can be computed in the allowed time for n <= 10^6 (about 0.06 seconds on my old 32-bit box), although with Python, on the testing machines, further optimisations might be necessary.
