Fastest way to prime factorise a number up to 10^18 - algorithm

Given a number 1 <= n <= 10^18, how can I factorise it in least time complexity?
There are many posts on the internet addressing how you can find prime factors but none of them (at least from what I've seen) state their benefits, say in a particular situation.
I use Pollard's rho algorithm in addition to Eratosthenes' sieve:
Using sieve, find all prime numbers in the first 107 numbers, and then divide n with these primes as much as possible.
Now use Pollard's rho algorithm to try and find the rest of the primes until n is equal to 1.
My Implementation:
#include <iostream>
#include <vector>
#include <cstdio>
#include <ctime>
#include <cmath>
#include <cstdlib>
#include <algorithm>
#include <string>
using namespace std;
typedef unsigned long long ull;
typedef long double ld;
typedef pair <ull, int> pui;
#define x first
#define y second
#define mp make_pair
bool prime[10000005];
vector <ull> p;
void initprime(){
prime[2] = 1;
for(int i = 3 ; i < 10000005 ; i += 2){
prime[i] = 1;
}
for(int i = 3 ; i * i < 10000005 ; i += 2){
if(prime[i]){
for(int j = i * i ; j < 10000005 ; j += 2 * i){
prime[j] = 0;
}
}
}
for(int i = 0 ; i < 10000005 ; ++i){
if(prime[i]){
p.push_back((ull)i);
}
}
}
ull modularpow(ull base, ull exp, ull mod){
ull ret = 1;
while(exp){
if(exp & 1){
ret = (ret * base) % mod;
}
exp >>= 1;
base = (base * base) % mod;
}
return ret;
}
ull gcd(ull x, ull y){
while(y){
ull temp = y;
y = x % y;
x = temp;
}
return x;
}
ull pollardrho(ull n){
srand(time(NULL));
if(n == 1)
return n;
ull x = (rand() % (n - 2)) + 2;
ull y = x;
ull c = (rand() % (n - 1)) + 1;
ull d = 1;
while(d == 1){
x = (modularpow(x, 2, n) + c + n) % n;
y = (modularpow(y, 2, n) + c + n) % n;
y = (modularpow(y, 2, n) + c + n) % n;
d = gcd(abs(x - y), n);
if(d == n){
return pollardrho(n);
}
}
return d;
}
int main ()
{
ios_base::sync_with_stdio(false);
cin.tie(0);
initprime();
ull n;
cin >> n;
ull c = n;
vector <pui> o;
for(vector <ull>::iterator i = p.begin() ; i != p.end() ; ++i){
ull t = *i;
if(!(n % t)){
o.push_back(mp(t, 0));
}
while(!(n % t)){
n /= t;
o[o.size() - 1].y++;
}
}
while(n > 1){
ull u = pollardrho(n);
o.push_back(mp(u, 0));
while(!(n % u)){
n /= u;
o[o.size() - 1].y++;
}
if(n < 10000005){
if(prime[n]){
o.push_back(mp(n, 1));
}
}
}
return 0;
}
Is there any faster way to factor such numbers? If possible, please explain why along with the source code.

Approach
Lets say you have a number n that goes up to 1018 and you want to prime factorise it. Since this number can be as small as unity and as big as 1018, all along it can be prime as well as composite, this would be my approach -
Using miller rabin primality testing, make sure that the number is composite.
Factorise n using primes up to 106, which can be calculated using sieve of Eratosthenes.
Now the updated value of n is such that it has prime factors only above 106 and since the value of n can still be as big as 1018, we conclude that the number is either prime or it has exactly two prime factors (not necessarily distinct).
Run Miller Rabin again to ensure the number isn't prime.
Use Pollard rho algorithm to get one prime factor.
You have the complete factorisation now.
Lets look at the time-complexity of the above approach:
Miller Rabin takes O(log n)
Sieve of Eratosthenes takes O(n*log n)
The implementation of Pollard rho I shared takes O(n^0.25)
Time Complexity
Step 2 takes maximum time which is equal to O(10^7), which is in turn the complexity of the above algorithm. This means you can find the factorisation within a second for almost all programming languages.
Space Complexity
Space is used only in the step 2 where sieve is implemented and is equal to O(10^6). Again, very practical for the purpose.
Implementation
Complete Code implemented in C++14. The code has a hidden bug. You can either reveal it in the next section, or skip towards the challenge ;)
Bug in the code
In line 105, iterate till i<=np. Otherwise, you may miss the cases where prime[np]=999983 is a prime factor
Challenge
Give me a value of n, if any, where the shared code results in wrong prime factorisation.
Bonus
How many such values of n exist ?
Hint
For such value of n, assertion in Line 119 may fail.
Solution
Lets call P=999983. All numbers of the form n = p*q*r where p, q, r are primes >= P such that at least one of them is equal to P will result in wrong prime factorisation.
Bonus Solution
There are exactly four such numbers: {P03, P02P1, P02P2, P0P12}, where P0 = P = 999983, P1 = next_prime(P0) = 1000003, P2 = next_prime(P1) = 1000033.

The fastest solution for 64-bit inputs on modern processors is a small amount of trial division (the amount will differ, but something under 100 is common) followed by Pollard's Rho. You will need a good deterministic primality test using Miller-Rabin or BPSW, and a infrastructure to handle multiple factors (e.g. if a composite is split into more composites). For 32-bit you can optimize each of these things even more.
You will want a fast mulmod, as it is the core of both Pollard's Rho, Miller-Rabin, and the Lucas test. Ideally this is done as a tiny assembler snippet.
Times should be under 1 millisecond to factor any 64-bit input. Significantly faster under 50 bits.
As shown by Ben Buhrow's spBrent implementation, algorithm P2'' from Brent's 1980 paper seems to be as fast as the other implementations I'm aware of. It uses Brent's improved cycle finding as well as the useful trick of delaying GCDs with the necessary added backtracking.
See this thread on Mersenneforum for some messy details and benchmarking of various solutions. I have a number of benchmarks of these and other implementations at different sizes, but haven't published anything (partly because there are so many ways to look at the data).
One of the really interesting things to come out of this was that SQUFOF, for many years believed to be the better solution for the high end of the 64-bit range, no longer is competitive. SQUFOF does have the advantage of only needing a fast perfect-square detector for best speed, which doesn't have to be in asm to be really fast.

Related

Finding kth element in the nth order of Farey Sequence

Farey sequence of order n is the sequence of completely reduced fractions, between 0 and 1 which when in lowest terms have denominators less than or equal to n, arranged in order of increasing size. Detailed explanation here.
Problem
The problem is, given n and k, where n = order of seq and k = element index, can we find the particular element from the sequence. For examples answer for (n=5, k =6) is 1/2.
Lead
There are many less than optimal solution available, but am looking for a near-optimal one. One such algorithm is discussed here, for which I am unable to understand the logic hence unable to apply the examples.
Question
Can some please explain the solution with more detail, preferably with an example.
Thank you.
I've read the method provided in your link, and the accepted C++ solution to it. Let me post them, for reference:
Editorial Explanation
Several less-than-optimal solutions exist. Using a priority queue, one
can iterate through the fractions (generating them one by one) in O(K
log N) time. Using a fancier math relation, this can be reduced to
O(K). However, neither of these solution obtains many points, because
the number of fractions (and thus K) is quadratic in N.
The “good” solution is based on meta-binary search. To construct this
solution, we need the following subroutine: given a fraction A/B
(which is not necessarily irreducible), find how many fractions from
the Farey sequence are less than this fraction. Suppose we had this
subroutine; then the algorithm works as follows:
Determine a number X such that the answer is between X/N and (X+1)/N; such a number can be determined by binary searching the range
1...N, thus calling the subroutine O(log N) times.
Make a list of all fractions A/B in the range X/N...(X+1)/N. For any given B, there is at most one A in this range, and it can be
determined trivially in O(1).
Determine the appropriate order statistic in this list (doing this in O(N log N) by sorting is good enough).
It remains to show how we can construct the desired subroutine. We
will show how it can be implemented in O(N log N), thus giving a O(N
log^2 N) algorithm overall. Let us denote by C[j] the number of
irreducible fractions i/j which are less than X/N. The algorithm is
based on the following observation: C[j] = floor(X*B/N) – Sum(C[D],
where D divides j). A direct implementation, which tests whether any D
is a divisor, yields a quadratic algorithm. A better approach,
inspired by Eratosthene’s sieve, is the following: at step j, we know
C[j], and we subtract it from all multiples of j. The running time of
the subroutine becomes O(N log N).
Relevant Code
#include <cassert>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
const int kMaxN = 2e5;
typedef int int32;
typedef long long int64_x;
// #define int __int128_t
// #define int64 __int128_t
typedef long long int64;
int64 count_less(int a, int n) {
vector<int> counter(n + 1, 0);
for (int i = 2; i <= n; i += 1) {
counter[i] = min(1LL * (i - 1), 1LL * i * a / n);
}
int64 result = 0;
for (int i = 2; i <= n; i += 1) {
for (int j = 2 * i; j <= n; j += i) {
counter[j] -= counter[i];
}
result += counter[i];
}
return result;
}
int32 main() {
// ifstream cin("farey.in");
// ofstream cout("farey.out");
int64_x n, k; cin >> n >> k;
assert(1 <= n);
assert(n <= kMaxN);
assert(1 <= k);
assert(k <= count_less(n, n));
int up = 0;
for (int p = 29; p >= 0; p -= 1) {
if ((1 << p) + up > n)
continue;
if (count_less((1 << p) + up, n) < k) {
up += (1 << p);
}
}
k -= count_less(up, n);
vector<pair<int, int>> elements;
for (int i = 1; i <= n; i += 1) {
int b = i;
// find a such that up/n < a / b and a / b <= (up+1) / n
int a = 1LL * (up + 1) * b / n;
if (1LL * up * b < 1LL * a * n) {
} else {
continue;
}
if (1LL * a * n <= 1LL * (up + 1) * b) {
} else {
continue;
}
if (__gcd(a, b) != 1) {
continue;
}
elements.push_back({a, b});
}
sort(elements.begin(), elements.end(),
[](const pair<int, int>& lhs, const pair<int, int>& rhs) -> bool {
return 1LL * lhs.first * rhs.second < 1LL * rhs.first * lhs.second;
});
cout << (int64_x)elements[k - 1].first << ' ' << (int64_x)elements[k - 1].second << '\n';
return 0;
}
Basic Methodology
The above editorial explanation results in the following simplified version. Let me start with an example.
Let's say, we want to find 7th element of Farey Sequence with N = 5.
We start with writing a subroutine, as said in the explanation, that gives us the "k" value (how many Farey Sequence reduced fractions there exist before a given fraction - the given number may or may not be reduced)
So, take your F5 sequence:
k = 0, 0/1
k = 1, 1/5
k = 2, 1/4
k = 3, 1/3
k = 4, 2/5
k = 5, 1/2
k = 6, 3/5
k = 7, 2/3
k = 8, 3/4
k = 9, 4/5
k = 10, 1/1
If we can find a function that finds the count of the previous reduced fractions in Farey Sequence, we can do the following:
int64 k_count_2 = count_less(2, 5); // result = 4
int64 k_count_3 = count_less(3, 5); // result = 6
int64 k_count_4 = count_less(4, 5); // result = 9
This function is written in the accepted solution. It uses the exact methodology explained in the last paragraph of the editorial.
As you can see, the count_less() function generates the same k values as in our hand written list.
We know the values of the reduced fractions for k = 4, 6, 9 using that function. What about k = 7? As explained in the editorial, we will list all the reduced fractions in range X/N and (X+1)/N, here X = 3 and N = 5.
Using the function in the accepted solution (its near bottom), we list and sort the reduced fractions.
After that we will rearrange our k values, as in to fit in our new array as such:
k = -, 0/1
k = -, 1/5
k = -, 1/4
k = -, 1/3
k = -, 2/5
k = -, 1/2
k = -, 3/5 <-|
k = 0, 2/3 | We list and sort the possible reduced fractions
k = 1, 3/4 | in between these numbers
k = -, 4/5 <-|
k = -, 1/1
(That's why there is this piece of code: k -= count_less(up, n);, it basically remaps the k values)
(And we also subtract one more during indexing, i.e.: cout << (int64_x)elements[k - 1].first << ' ' << (int64_x)elements[k - 1].second << '\n';. This is just to basically call the right position in the generated array.)
So, for our new re-mapped k values, for N = 5 and k = 7 (original k), our result is 2/3.
(We select the value k = 0, in our new map)
If you compile and run the accepted solution, it will give you this:
Input: 5 7 (Enter)
Output: 2 3
I believe this is the basic point of the editorial and accepted solution.

finding all divisors of all the numbers from 1 to 10^6 efficiently

I need to find all divisors of all numbers between 1 and n (including 1 and n). where n equals 10^6 and I want to store them in the vector.
vector< vector<int> > divisors(1000000);
void abc()
{
long int n=1,num;
while(n<1000000)
{
num=n;
int limit=sqrt(num);
for(long int i=1;i<limit;i++)
{
if(num%i==0)
{
divisors[n].push_back(i);
divisors[n].push_back(num/i);
}
}
n++;
}
}
This is too much time taking as well. Can i optimize it in any way?
const int N = 1000000;
vector<vector<int>> divisors(N+1);
for (int i = 2; i <= N; i++) {
for (j = i; j <= N; j += i) {
divisors[j].push_back(i);
}
}
this runs in O(N*log(N))
Intuition is that upper N/2 numbers are run only once. Then from remaining numbers upper half are run once more ...
Other way around. If you increase N from lets say 10^6 to 10^7, than you have as many opertions as at 10^6 times 10. (that is linear), but what is extra are numbers from 10^6 to 10^7 that doesnt run more than 10 times each at worst.
number of operaions is
sum (N/n for n from 1 to N)
this becomes then N * sum(1/n for n from 1 to N) and this is N*log(N) that can be shown using integration of 1/x over dx from 1 to N
We can see that algorhitm is optimal, because there is as many operation as is number of divisors. Size of result or total number of divisors is same as complexity of algorhitm.
I think this might not be the best solution, but it is much better than the one presented, so here we go:
Go over all the numbers (i) from 1 to n, and for each number:
Add the number to the list of itself.
Set multiplier to 2.
Add i to the list of i * multiplier.
increase multiplier.
Repeat steps 3 & 4 until i * multiplier is greater than n.
[Edit3] complete reedit
Your current approach is O(n^1.5) not O(n^2)
Originally I suggested to see Why are my nested for loops taking so long to compute?
But as Oliver Charlesworth suggested to me to read About Vectors growth That should not be much of an issue here (also the measurements confirmed it).
So no need to preallocating of memroy for the list (it would just waste memory and due to CACHE barriers even lower the overall performance at least on mine setup).
So how to optimize?
either lower the constant time so the runtime is better of your iteration (even with worse complexity)
or lower the complexity so much that overhead is not bigger to actually have some speedup
I would start with SoF (Sieve of Eratosthenes)
But instead setting number as divisible I would add currently iterated sieve to the number divisor list. This should be O(n^2) but with much lower overhead (no divisions and fully parallelisable) if coded right.
start computing SoF for all numbers i=2,3,4,5,...,n-1
for each number x you hit do not update SoF table (you do not need it). Instead add the iterated sieve i to the divisor list of x. Something like:
C++ source:
const int n=1000000;
List<int> divs[n];
void divisors()
{
int i,x;
for (i=1;i<n;i++)
for (x=i;x<n;x+=i)
divs[x].add(i);
}
This took 1.739s and found 13969984 divisors total, max 240 divisors per number (including 1 and x). As you can see it does not use any divisions. and the divisors are sorted ascending.
List<int> is dynamic list of integers template (something like your vector<>)
You can adapt this to your kind of iteration so you can check up to nn=sqrt(n) and add 2 divisors per iteration that is O(n^1.5*log(n)) with different constant time (overhead) a bit slower due to single division need and duplicity check (log(n) with high base) so you need to measure if it speeds things up or not on my setup is this way slower (~2.383s even if it is with better complexity).
const int n=1000000;
List<int> divs[n];
int i,j,x,y,nn=sqrt(n);
for (i=1;i<=nn;i++)
for (x=i;x<n;x+=i)
{
for (y=divs[x].num-1;y>=0;y--)
if (i==divs[x][y]) break;
if (y<0) divs[x].add(i);
j=x/i;
for (y=divs[x].num-1;y>=0;y--)
if (j==divs[x][y]) break;
if (y<0) divs[x].add(j);
}
Next thing is to use direct memory access (not sure you can do that with vector<>) my list is capable of such thing do not confuse it with hardware DMA this is just avoidance of array range checking. This speeds up the constant overhead of the duplicity check and the result time is [1.793s] which is a little bit slower then the raw SoF O(n^2) version. So if you got bigger n this would be the way.
[Notes]
If you want to do prime decomposition then iterate i only through primes (in that case you need the SoF table) ...
If you got problems with the SoF or primes look at Prime numbers by Eratosthenes quicker sequential than concurrently? for some additional ideas on this
Another optimization is not to use -vector- nor -list- , but a large array of divisors, see http://oeis.org/A027750
First step: Sieve of number of divisors
Second step: Sieve of divisors with the total number of divisors
Note: A maximum of 20-fold time increase for 10-fold range. --> O(N*log(N))
Dev-C++ 5.11 , in C
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int SieveNbOfDiv(int NumberOfDivisors[], int IndexCount[], int Limit) {
for (int i = 1; i*i <= Limit; i++) {
NumberOfDivisors[i*i] += 1;
for (int j = i*(i+1); j <= Limit; j += i )
NumberOfDivisors[j] += 2;
}
int Count = 0;
for (int i = 1; i <= Limit; i++) {
Count += NumberOfDivisors[i];
NumberOfDivisors[i] = Count;
IndexCount[i] = Count;
}
return Count;
}
void SieveDivisors(int IndexCount[], int NumberOfDivisors[], int Divisors[], int Limit) {
for (int i = 1; i <= Limit; i++) {
Divisors[IndexCount[i-1]++] = 1;
Divisors[IndexCount[i]-1] = i;
}
for (int i = 2; i*i <= Limit; i++) {
Divisors[IndexCount[i*i-1]++] = i;
for (int j = i*(i+1); j <= Limit; j += i ) {
Divisors[IndexCount[j-1]++] = i;
Divisors[NumberOfDivisors[j-1] + NumberOfDivisors[j] - IndexCount[j-1]] = j/i;
}
}
}
int main(int argc, char *argv[]) {
int N = 1000000;
if (argc > 1) N = atoi(argv[1]);
int ToPrint = 0;
if (argc > 2) ToPrint = atoi(argv[2]);
clock_t Start = clock();
printf("Using sieve of divisors from 1 to %d\n\n", N);
printf("Evaluating sieve of number of divisors ...\n");
int *NumberOfDivisors = (int*) calloc(N+1, sizeof(int));
int *IndexCount = (int*) calloc(N+1, sizeof(int));
int size = SieveNbOfDiv(NumberOfDivisors, IndexCount, N);
printf("Total number of divisors = %d\n", size);
printf("%0.3f second(s)\n\n", (clock() - Start)/1000.0);
printf("Evaluating sieve of divisors ...\n");
int *Divisors = (int*) calloc(size+1, sizeof(int));
SieveDivisors(IndexCount, NumberOfDivisors, Divisors, N);
printf("%0.3f second(s)\n", (clock() - Start)/1000.0);
if (ToPrint == 1)
for (int i = 1; i <= N; i++) {
printf("%d(%d) = ", i, NumberOfDivisors[i] - NumberOfDivisors[i-1]);
for (int j = NumberOfDivisors[i-1]; j < NumberOfDivisors[i]; j++)
printf("%d ", Divisors[j]);
printf("\n");
}
return 0;
}
With some results:
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
c:\Users\Ab\Documents\gcc\sievedivisors>sievedivisors 100000
Using sieve of divisors from 1 to 100000
Evaluating sieve of number of divisors ...
Total number of divisors = 1166750
0.000 second(s)
Evaluating sieve of divisors ...
0.020 second(s)
c:\Users\Ab\Documents\gcc\sievedivisors>sievedivisors 1000000
Using sieve of divisors from 1 to 1000000
Evaluating sieve of number of divisors ...
Total number of divisors = 13970034
0.060 second(s)
Evaluating sieve of divisors ...
0.610 second(s)
c:\Users\Ab\Documents\gcc\sievedivisors>sievedivisors 10000000
Using sieve of divisors from 1 to 10000000
Evaluating sieve of number of divisors ...
Total number of divisors = 162725364
0.995 second(s)
Evaluating sieve of divisors ...
11.900 second(s)
c:\Users\Ab\Documents\gcc\sievedivisors>

Another Pollard Rho Implementation

In an attempt to solve the 3rd problem on project Euler (https://projecteuler.net/problem=3), I decided to implement Pollard's Rho algorithm (at least part of it, I'm planning on including the cycling later). The odd thing is that it works for numbers such as: 82123(factor = 41) and 16843009(factor 257). However when I try the project Euler number: 600851475143, I end up getting 71 when the largest prime factor is 6857. Here's my implementation(sorry for wall of code and lack of type casting):
#include <iostream>
#include <math.h>
#include <vector>
using namespace std;
long long int gcd(long long int a,long long int b);
long long int f(long long int x);
int main()
{
long long int i, x, y, N, factor, iterations = 0, counter = 0;
vector<long long int>factors;
factor = 1;
x = 631;
N = 600851475143;
factors.push_back(x);
while (factor == 1)
{
y = f(x);
y = y % N;
factors.push_back(y);
cout << "\niteration" << iterations << ":\t";
i = 0;
while (factor == 1 && (i < factors.size() - 1))
{
factor = gcd(abs(factors.back() - factors[i]), N);
cout << factor << " ";
i++;
}
x = y;
//factor = 2;
iterations++;
}
system("PAUSE");
return 0;
}
long long int gcd(long long int a, long long int b)
{
long long int remainder;
do
{
remainder = a % b;
a = b;
b = remainder;
} while (remainder != 0);
return a;
}
long long int f(long long int x)
{
//x = x*x * 1024 + 32767;
x = x*x + 1;
return x;
}
Pollard's rho algorithm guarantees nothing. It doesn't guarantee to find the largest factor. It doesn't guarantee that any factor it finds is prime. It doesn't even guarantee to find a factor at all. The rho algorithm is probabilistic; it will probably find a factor, but not necessarily. Since your function returns a factor, it works.
That said, your implementation isn't very good. It's not necessary to store all previous values of the function, and compute the gcd to each every time through the loop. Here is pseudocode for a better version of the function:
function rho(n)
for c from 1 to infinity
h, t := 1, 1
repeat
h := (h*h+c) % n # the hare runs ...
h := (h*h+c) % n # ... twice as fast
t := (t*t+c) % n # as the tortoise
g := gcd(t-h, n)
while g == 1
if g < n then return g
This function returns a single factor of n, which may be either prime or composite. It stores only two values of the random sequence, and stops when it finds a cycle (when g == n), restarting with a different random sequence (by incrementing c). Otherwise it keeps going until it finds a factor, which shouldn't take too long as long as you limit the input to 64-bit integers. Find more factors by applying rho to the remaining cofactor, or if the factor that is found is composite, stopping when all the prime factors have been found.
By the way, you don't need Pollard's rho algorithm to solve Project Euler #3; simple trial division is sufficient. This algorithm finds all the prime factors of a number, from which you can extract the largest:
function factors(n)
f := 2
while f * f <= n
while n % f == 0
print f
n := n / f
f := f + 1
if n > 1 then print n

Approach for better solution - Sum of medians

Here is the question Spoj-WEIRDFN
Problem:
Let us define :
F[1] = 1
F[i] = (a*M[i] + b*i + c)%1000000007 for i > 1
where M[i] is the median of the array {F[1],F[2],..,F[i-1]}
Given a,b,c and n, calculate the sum F[1] + F[2] + .. + F[n].
Constraints:
0 <= a,b,c < 1000000007
1 <= n <= 200000
I came up with a solution which is not so efficient
MY SOLUTION::--
#include <bits/stdc++.h>
using namespace std;
#define ll long long int
#define mod 1000000007
int main() {
// your code goes here
int t;
scanf("%d",&t);
while(t--)
{
ll a,b,c,sum=0;
int n;
scanf("%lld%lld%lld%d",&a,&b,&c,&n);
ll f[n+1];
f[1]=1;
f[0]=0;
for(int i=2;i<=n;i++)
{
ll temp;
sort(&f[1],&f[i]);
temp=f[i/2];
f[i]=((a*(temp)%mod)+((b*i)%mod)+(c%mod))%mod;
sum+=f[i];
}
printf("%lld\n",sum+f[1]);
}
return 0;
}
Can anybody give me hint for for better algorithm or data structure for this task
For each test case, you can maintain a binary search tree, thus you can find the median of n elements in O(log n) time, and you only need O(log n) time to add a new element into the tree.
Thus, we have an O(T*nlogn) algorithm, with T is number of test case, and n is number of elements, which should be enough to pass.

Segmented Sieve of Eratosthenes?

It's easy enough to make a simple sieve:
for (int i=2; i<=N; i++){
if (sieve[i]==0){
cout << i << " is prime" << endl;
for (int j = i; j<=N; j+=i){
sieve[j]=1;
}
}
cout << i << " has " << sieve[i] << " distinct prime factors\n";
}
But what about when N is very large and I can't hold that kind of array in memory? I've looked up segmented sieve approaches and they seem to involve finding primes up until sqrt(N) but I don't understand how it works. What if N is very large (say 10^18)?
The basic idea of a segmented sieve is to choose the sieving primes less than the square root of n, choose a reasonably large segment size that nevertheless fits in memory, and then sieve each of the segments in turn, starting with the smallest. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked as composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, for each sieving prime you already know the first multiple in the current segment (it was the multiple that ended the sieving for that prime in the prior segment), so you sieve on each sieving prime, and so on until you are finished.
The size of n doesn't matter, except that a larger n will take longer to sieve than a smaller n; the size that matters is the size of the segment, which should be as large as convenient (say, the size of the primary memory cache on the machine).
You can see a simple implementation of a segmented sieve here. Note that a segmented sieve will be very much faster than O'Neill's priority-queue sieve mentioned in another answer; if you're interested, there's an implementation here.
EDIT: I wrote this for a different purpose, but I'll show it here because it might be useful:
Though the Sieve of Eratosthenes is very fast, it requires O(n) space. That can be reduced to O(sqrt(n)) for the sieving primes plus O(1) for the bitarray by performing the sieving in successive segments. At the first segment, the smallest multiple of each sieving prime that is within the segment is calculated, then multiples of the sieving prime are marked composite in the normal way; when all the sieving primes have been used, the remaining unmarked numbers in the segment are prime. Then, for the next segment, the smallest multiple of each sieving prime is the multiple that ended the sieving in the prior segment, and so the sieving continues until finished.
Consider the example of sieve from 100 to 200 in segments of 20. The five sieving primes are 3, 5, 7, 11 and 13. In the first segment from 100 to 120, the bitarray has ten slots, with slot 0 corresponding to 101, slot k corresponding to 100+2k+1, and slot 9 corresponding to 119. The smallest multiple of 3 in the segment is 105, corresponding to slot 2; slots 2+3=5 and 5+3=8 are also multiples of 3. The smallest multiple of 5 is 105 at slot 2, and slot 2+5=7 is also a multiple of 5. The smallest multiple of 7 is 105 at slot 2, and slot 2+7=9 is also a multiple of 7. And so on.
Function primesRange takes arguments lo, hi and delta; lo and hi must be even, with lo < hi, and lo must be greater than sqrt(hi). The segment size is twice delta. Ps is a linked list containing the sieving primes less than sqrt(hi), with 2 removed since even numbers are ignored. Qs is a linked list containing the offest into the sieve bitarray of the smallest multiple in the current segment of the corresponding sieving prime. After each segment, lo advances by twice delta, so the number corresponding to an index i of the sieve bitarray is lo + 2i + 1.
function primesRange(lo, hi, delta)
function qInit(p)
return (-1/2 * (lo + p + 1)) % p
function qReset(p, q)
return (q - delta) % p
sieve := makeArray(0..delta-1)
ps := tail(primes(sqrt(hi)))
qs := map(qInit, ps)
while lo < hi
for i from 0 to delta-1
sieve[i] := True
for p,q in ps,qs
for i from q to delta step p
sieve[i] := False
qs := map(qReset, ps, qs)
for i,t from 0,lo+1 to delta-1,hi step 1,2
if sieve[i]
output t
lo := lo + 2 * delta
When called as primesRange(100, 200, 10), the sieving primes ps are [3, 5, 7, 11, 13]; qs is initially [2, 2, 2, 10, 8] corresponding to smallest multiples 105, 105, 105, 121 and 117, and is reset for the second segment to [1, 2, 6, 0, 11] corresponding to smallest multiples 123, 125, 133, 121 and 143.
You can see this program in action at http://ideone.com/iHYr1f. And in addition to the links shown above, if you are interested in programming with prime numbers I modestly recommend this essay at my blog.
It's just that we are making segmented with the sieve we have.
The basic idea is let's say we have to find out prime numbers between 85 and 100.
We have to apply the traditional sieve,but in the fashion as described below:
So we take the first prime number 2 , divide the starting number by 2(85/2) and taking round off to smaller number we get p=42,now multiply again by 2 we get p=84, from here onwards start adding 2 till the last number.So what we have done is that we have removed all the factors of 2(86,88,90,92,94,96,98,100) in the range.
We take the next prime number 3 , divide the starting number by 3(85/3) and taking round off to smaller number we get p=28,now multiply again by 3 we get p=84, from here onwards start adding 3 till the last number.So what we have done is that we have removed all the factors of 3(87,90,93,96,99) in the range.
Take the next prime number=5 and so on..................
Keep on doing the above steps.You can get the prime numbers (2,3,5,7,...) by using the traditional sieve upto sqrt(n).And then use it for segmented sieve.
There's a version of the Sieve based on priority queues that yields as many primes as you request, rather than all of them up to an upper bound. It's discussed in the classic paper "The Genuine Sieve of Eratosthenes" and googling for "sieve of eratosthenes priority queue" turns up quite a few implementations in various programming languages.
If someone would like to see C++ implementation, here is mine:
void sito_delta( int delta, std::vector<int> &res)
{
std::unique_ptr<int[]> results(new int[delta+1]);
for(int i = 0; i <= delta; ++i)
results[i] = 1;
int pierw = sqrt(delta);
for (int j = 2; j <= pierw; ++j)
{
if(results[j])
{
for (int k = 2*j; k <= delta; k+=j)
{
results[k]=0;
}
}
}
for (int m = 2; m <= delta; ++m)
if (results[m])
{
res.push_back(m);
std::cout<<","<<m;
}
};
void sito_segment(int n,std::vector<int> &fiPri)
{
int delta = sqrt(n);
if (delta>10)
{
sito_segment(delta,fiPri);
// COmpute using fiPri as primes
// n=n,prime = fiPri;
std::vector<int> prime=fiPri;
int offset = delta;
int low = offset;
int high = offset * 2;
while (low < n)
{
if (high >=n ) high = n;
int mark[offset+1];
for (int s=0;s<=offset;++s)
mark[s]=1;
for(int j = 0; j< prime.size(); ++j)
{
int lowMinimum = (low/prime[j]) * prime[j];
if(lowMinimum < low)
lowMinimum += prime[j];
for(int k = lowMinimum; k<=high;k+=prime[j])
mark[k-low]=0;
}
for(int i = low; i <= high; i++)
if(mark[i-low])
{
fiPri.push_back(i);
std::cout<<","<<i;
}
low=low+offset;
high=high+offset;
}
}
else
{
std::vector<int> prime;
sito_delta(delta, prime);
//
fiPri = prime;
//
int offset = delta;
int low = offset;
int high = offset * 2;
// Process segments one by one
while (low < n)
{
if (high >= n) high = n;
int mark[offset+1];
for (int s = 0; s <= offset; ++s)
mark[s] = 1;
for (int j = 0; j < prime.size(); ++j)
{
// find the minimum number in [low..high] that is
// multiple of prime[i] (divisible by prime[j])
int lowMinimum = (low/prime[j]) * prime[j];
if(lowMinimum < low)
lowMinimum += prime[j];
//Mark multiples of prime[i] in [low..high]
for (int k = lowMinimum; k <= high; k+=prime[j])
mark[k-low] = 0;
}
for (int i = low; i <= high; i++)
if(mark[i-low])
{
fiPri.push_back(i);
std::cout<<","<<i;
}
low = low + offset;
high = high + offset;
}
}
};
int main()
{
std::vector<int> fiPri;
sito_segment(1013,fiPri);
}
Based on Swapnil Kumar answer I did the following algorithm in C. It was built with mingw32-make.exe.
#include<math.h>
#include<stdio.h>
#include<stdlib.h>
int main()
{
const int MAX_PRIME_NUMBERS = 5000000;//The number of prime numbers we are looking for
long long *prime_numbers = malloc(sizeof(long long) * MAX_PRIME_NUMBERS);
prime_numbers[0] = 2;
prime_numbers[1] = 3;
prime_numbers[2] = 5;
prime_numbers[3] = 7;
prime_numbers[4] = 11;
prime_numbers[5] = 13;
prime_numbers[6] = 17;
prime_numbers[7] = 19;
prime_numbers[8] = 23;
prime_numbers[9] = 29;
const int BUFFER_POSSIBLE_PRIMES = 29 * 29;//Because the greatest prime number we have is 29 in the 10th position so I started with a block of 841 numbers
int qt_calculated_primes = 10;//10 because we initialized the array with the ten first primes
int possible_primes[BUFFER_POSSIBLE_PRIMES];//Will store the booleans to check valid primes
long long iteration = 0;//Used as multiplier to the range of the buffer possible_primes
int i;//Simple counter for loops
while(qt_calculated_primes < MAX_PRIME_NUMBERS)
{
for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
possible_primes[i] = 1;//set the number as prime
int biggest_possible_prime = sqrt((iteration + 1) * BUFFER_POSSIBLE_PRIMES);
int k = 0;
long long prime = prime_numbers[k];//First prime to be used in the check
while (prime <= biggest_possible_prime)//We don't need to check primes bigger than the square root
{
for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
if ((iteration * BUFFER_POSSIBLE_PRIMES + i) % prime == 0)
possible_primes[i] = 0;
if (++k == qt_calculated_primes)
break;
prime = prime_numbers[k];
}
for (i = 0; i < BUFFER_POSSIBLE_PRIMES; i++)
if (possible_primes[i])
{
if ((qt_calculated_primes < MAX_PRIME_NUMBERS) && ((iteration * BUFFER_POSSIBLE_PRIMES + i) != 1))
{
prime_numbers[qt_calculated_primes] = iteration * BUFFER_POSSIBLE_PRIMES + i;
printf("%d\n", prime_numbers[qt_calculated_primes]);
qt_calculated_primes++;
} else if (!(qt_calculated_primes < MAX_PRIME_NUMBERS))
break;
}
iteration++;
}
return 0;
}
It set a maximum of prime numbers to be found, then an array is initialized with known prime numbers like 2, 3, 5...29. So we make a buffer that will store the segments of possible primes, this buffer can't be greater than the power of the greatest initial prime that in this case is 29.
I'm sure there are a plenty of optimizations that can be done to improve the performance like parallelize the segments analysis process and skip numbers that are multiple of 2, 3 and 5 but it serves as an example of low memory consumption.
A number is prime if none of the smaller prime numbers divides it. Since we iterate over the prime numbers in order, we already marked all numbers, who are divisible by at least one of the prime numbers, as divisible. Hence if we reach a cell and it is not marked, then it isn't divisible by any smaller prime number and therefore has to be prime.
Remember these points:-
// Generating all prime number up to R
// creating an array of size (R-L-1) set all elements to be true: prime && false: composite
#include<bits/stdc++.h>
using namespace std;
#define MAX 100001
vector<int>* sieve(){
bool isPrime[MAX];
for(int i=0;i<MAX;i++){
isPrime[i]=true;
}
for(int i=2;i*i<MAX;i++){
if(isPrime[i]){
for(int j=i*i;j<MAX;j+=i){
isPrime[j]=false;
}
}
}
vector<int>* primes = new vector<int>();
primes->push_back(2);
for(int i=3;i<MAX;i+=2){
if(isPrime[i]){
primes->push_back(i);
}
}
return primes;
}
void printPrimes(long long l, long long r, vector<int>*&primes){
bool isprimes[r-l+1];
for(int i=0;i<=r-l;i++){
isprimes[i]=true;
}
for(int i=0;primes->at(i)*(long long)primes->at(i)<=r;i++){
int currPrimes=primes->at(i);
//just smaller or equal value to l
long long base =(l/(currPrimes))*(currPrimes);
if(base<l){
base=base+currPrimes;
}
//mark all multiplies within L to R as false
for(long long j=base;j<=r;j+=currPrimes){
isprimes[j-l]=false;
}
//there may be a case where base is itself a prime number
if(base==currPrimes){
isprimes[base-l]= true;
}
}
for(int i=0;i<=r-l;i++){
if(isprimes[i]==true){
cout<<i+l<<endl;
}
}
}
int main(){
vector<int>* primes=sieve();
int t;
cin>>t;
while(t--){
long long l,r;
cin>>l>>r;
printPrimes(l,r,primes);
}
return 0;
}

Resources