Generating prime numbers in poly-time - algorithm

I am struggling to see how we can generate a list of J smallest primes in poly-time J, based on the fact that p'j is less than or equal to 2j * ln(j) for j > 2, where j indicates the j-th consecutive prime number. For instance, p1 = 2 for j=1, p2 = 3 for j = 2. p3 = 5 for j = 3, p4 = 7 for j = 4, p5 = 11 for j = 5, etc etc...
I just don't see how I can make use of this fact above. Any time I want to generate a prime, say the 7th, I will check by plugging in: 2(7)*ln(7) = 27.2427... But this is completely useless, as it turns out. This number is way bigger than the last generated prime in my array, which is logical. Hence I still have to resort to either brute force by checking the last prime+1 for mod0 with each of the primes in my array. The other option is to resort to already existing algorithms that reduce the time to polynomial time.
Can you show me how I can make use of that fact: p'j <= 2j*ln(j)? Thanks.

To show that I can generate a list of the first J primes in time polynomial in J, I need to work out the cost of however I am generating the list.
If I am generating the list by checking numbers one after the other and discarding non-primes, there are two parts to the cost of generating the list - how long it takes to check each number, and how many numbers I need to check.
If primes were vanishingly rare, then I couldn't afford to check every number from 2 on, because simply listing all those numbers would be too expensive. But if I know that the Jth prime is no larger than 2j*ln(j) I can at least list them all in polynomial time.
In fact, if I have to generate J primes, and I start by taking the first 2J*ln(J) numbers, and I decide to check each number for being prime by test dividing it by every prime found so far, I never have more than J primes on hand at any time, so I cannot end up doing more than 2J^2*ln(J) trial divisions. This won't win me any prize for efficiency or clever algorithms, or even sharp bounds for computational cost, but it is no worse than polynomial.

Maybe what's tripping you up is that you're thinking that the upper bound gives you a method to directly generate the jth prime, or the first j primes. It doesn't -- it just gives you a size limit on the set of numbers that you need to check with some method you already had lying around, e.g. trial division.
If I give you a list of the n-1 numbers 2, 3, ..., n and ask you to find all the primes in this list, you can do this using trial division in O(n^2) time:
For each pair of numbers x and y, just check whether x divides evenly into y, and cross y out if it does. This step is O(n^2), and requires O(n) space for keeping track of which numbers have been crossed out.
List out all the numbers that were not crossed out. This step is O(n).
Note that this finds all primes <= n in O(n^2) for any positive value of n. So in particular, if we are told some value of j, it will work for n = RoundDown(2j log j).
With n = RoundDown(2j log j), this algorithm runs in time O((2j log j)^2) = O(j^2 log^2 j), which is polynomial in j, and it must succeed in finding at least the first j primes, since the bound tells us that the jth prime can be at most RoundDown(2j log j), and we have included every number up to and including that one in our input list. (It may find even more primes, but we can discard them in linear time if need be.)
Exercise: Think about why we are allowed to round down here.


Devising optimzed algorithms for restricted-range partition problem

First of all, I'm referring to the 'partition problem' as the decision problem on whether some multiset of positive integers S can be partitioned into two equal-valued subsets. It is known that although this problem is NP-Complete, it can be solved in pseudo-linear O(nm) time via dynamic programming, where n is the number of integers in S, and m is half the sum of those integers (i.e. the target sum for each multiset). Here is an example of this approach in JS:
However, since m tends to scale up linearly with n (more precisely, linearly with the mean value of S), this O(nm) approach can still be quite inefficient for large n. Yet it seems to me that more efficient algorithms could be devised for range-restricted instances of the partition problem, where the value of the largest possible integer in S, k, is known in advance, and k is small. To discuss some examples, let's represent S as [n_1, n_2,...n_k], where each n_i represents the number of i's in S (e.g. [3, 0, 1, 2] represents the mset {1,1,1,3,4,4}). Below, we see that the k=1 and k=2 cases are very simple:
partition = [n_1] => n_1 % 2 == 0
(any even number of 1's can be partitioned evenly, and any odd number cannot)
partition = [n_1,n_2] => {
if (n_1 % 2) return false
return n_2 % 2 || n_1 >= 2 // (didn't exit, so n_1 is even)
(if n_1 is odd, the sum is odd, so return false. If both n_1 and n_2 are even, there is a trivial partition, but if instead n_2 is odd and n_1 >= 2, you can use a pair of 1's to offset the odd 2. Otherwise, there is no possible partition)
When we extend to the k=3 case, using a similar, parity-based approach, there are 2^3=8 possible cases, 4 of which are trivially false because they yield odd sums (the cases were n_1 and n_3 do not share the same parity), and one of which (the all even [e,e,e]) is trivially true (just take half of each n_i to partition). However, I'm struggling to derive rules for the [e,o,e], [o,e,o], and [o,o,o] instances which are potentially partition-able because they yield even sums, but not necessarily so.
So to sum up my question, I'm wondering if there are some additional conditional checks that could be applied to solve the k=3 case in constant (or perhaps O(2^k)) time, and whether similar 'constant time' algorithms could be derived for k=4,5,6... because even if the logical complexity/number of conditions to check grows exponentially in k, algorithms derived specifically for range-restricted partition instances could still be preferable to the generally applicable O(nm) DP solutions for n and m very large relative to k.
Putting this in a more directly mathematical light, the above process would be equivalent in complexity to determining the existence of satisfying m_1,...m_k in the following constrained Diophantine equation, where the lefthand side represents half the sum of all the integers in the set, and the m_i coeffecients on the right hand side (to be solved for) represent the partition in terms of the quantities of each integer:
(Which I don't know how I'd go about solving, but thought I'd throw it out there in case it sparks someone's imagination).

Compute number formed by the K-th lexicographic permutation of a given multiset of digits in modulo

You are given Q queries. Each query gives you a multiset S of digits 0 through 9 and an integer k. You are asked to determine the integer representation of the k-th lexicographic permutation of S, modulo 10^9+7.
Restrictions and other notes:
S contains at most 70.000 digits
The integer represented by a permutation of order n, p={pn-1,pn-2,...,p1,p0} is equal to the sum of pi*10i, for all i from 0 to n-1. As an example, the permutation {2,0,1} gives the integer 201. The permutation can also start with multiple 0-s, such that, for example, the permutation {0,0,0,1,2} will give the integer 12.
time limit 2 sec
Some examples:
For S={0,1},k=1, the result will be 1.
For S={0,1},k=2: 10
For S={0,1,2},k=1: 12
For S={0,1,2},k=2: 21
For S={0,1,2},k=5: 201
For S={0,1,1},k=2: 101
I'm having problems finding an efficient enough solution. I tried finding the k-th permutation via the usual method, then simply calculating the modulo, but it isn't fast enough. The modulo really changes things quite a bit, I think.
I've also observed that k is relatively small compared to the number of possible permutations,so this might make room for some optimizations.
I tried finding the k-th permutation via the usual method
Not sure what your usual method is. k-th permutation can be obtained in O(|S|) time, I'm assuming you're using that.
then simply calculating the modulo, but it isn't fast enough
Notice that you have the same size of S for multiple queries. You should build array D, D[i] = 10^i % M, then for each given permutation simply find sum of D[i]*S[p[i]] % M - once again, linear time.
Actually since k < 15! only last 15 digits change their order, everything before them needs to be computed only once for all queries.

What makes this prime factorization so efficient?

I've been doing some Project Euler problems to learn/practice Lua, and my initial quick-and-dirty way of finding the largest prime factor of nwas pretty bad, so I looked up some code to see how others were doing it (in attempts to understand different factoring methodologies).
I ran across the following (originally in Python - this is my Lua):
function Main()
local n = 102
local i = 2
while i^2 < n do
while n%i==0 do n = n / i end
i = i+1
This factored huge numbers in a very short time - almost immediately. The thing I noticed about the algorithm that I wouldn't have divined:
n = n / i
This seems to be in all of the decent algorithms. I've worked it out on paper with smaller numbers and I can see that it makes the numbers converge, but I don't understand why this operation converges on the largest prime factor.
Can anyone explain?
In this case, i is the prime factor candidate. Consider, n is composed of the following prime numbers:
n = p1^n1 * p2^n2 * p3^n3
When i reaches p1, the statement n = n / i = n / p1 removes one occurrence of p1:
n / p1 = p1^(n-1) * p2^n2 * p3^n3
The inner while iterates as long as there are p1s in n. Thus, after the iteration is complete (when i = i + 1 is executed), all occurrences of p1 have been removed and:
n' = p2^n2 * p3^n3
Let's skip some iterations until i reaches p3. The remaining n is then:
n'' = p3^n3
Here, we find a first mistake in the code. If n3 is 2, then the outer condition does not hold and we remain with p3^2. It should be while i^2 <= n.
As before, the inner while removes all occurences of p3, leaving us with n'''=1. This is the second mistake. It should be while n%i==0 and n>i (not sure about the LUA syntax), which keeps the very last occurence.
So the above code works for all numbers n where the largest prime factor occurrs only once by successivley removing all other factors. For all other numbers, the mentioned corrections should make it work, too.
This eliminates all the known smaller prime factors off n so that n becomes smaller, and sqrt(n) can be reached earlier. This gives the performance boost, as you no longer need to run numbers to square root of original N, say if n is a million, it consists of 2's and 5's, and naive querying against all known primes would need to check against all primes up to 1000, while dividing this by 2 yields 15625, then dividing by 5 yields 1 (by the way, your algorithm will return 1! To fix, if your loop exits with n=1, return i instead.) effectively factoring the big number in two steps. But this is only acceptable with "common" numbers, that have a single high prime denominator and a bunch of smaller ones, but factoring a number n=p*q whiere both p and q are primes and are close won't be able to benefit from this boost.
The n=n/i line works because if you are seeking another prime than i you are currently found as a divisor, the result is also divisible by that prime, by definition of prime numbers. Read here: . Also this only works in your case because your i runs from 2 upward, so that you first divide by primes then their composites. Otherwise, if your number would have a 3 as largest prime, is also divisible by 2 and you'd check against 6 first, you'd spoil the principle of only dividing by primes (say with 72, if you first divide by 6, you'll end up with 2, while the answer is 3) by accidentally dividing by a composite of a largest prime.
This algorithm (when corrected) takes O(max(p2,sqrt(p1))) steps to find the prime factorization of n, where p1 is the largest prime factor and the p2 is the second largest prime factor. In case of a repeated largest prime factor, p1=p2.
Knuth and Trabb Pardo studied the behavior of this function "Analysis of a Simple Factorization Algorithm" Theoretical Computer Science 3 (1976) 321-348. They argued against the usual analysis such as computing the average number of steps taken when factoring integers up to n. Although a few numbers with large prime factors boost the average value, in a cryptographic context what may be more relevant is that some of the percentiles are quite low. For example, 44.7% of numbers satisfy max(sqrt(p1),p2)<n^(1/3), and 1.2% of numbers satisfy max(sqrt(p1),p2)<n^(1/5).
A simple improvement is to test the remainder for primality after you find a new prime factor. It is very fast to test whether a number is prime. This reduces the time to O(p2) by avoiding the trial divisions between p2 and sqrt(p1). The median size of the second largest prime is about n^0.21. This means it is feasible to factor many 45-digit numbers rapidly (in a few processor-seconds) using this improvement on trial division. By comparison, Pollard-rho factorization on a product of two primes takes O(sqrt(p2)) steps on average, according to one model.

Product of Prime factors of a number

Given a number X , what would be the most efficient way to calculate the product of the prime factors of that number?
Is there a way to do this without actual factorisation ?
Note-The product of prime factors is needed (all to the power unity).
This answer addresses the second half of your question - i.e. is it possible to compute the product of the prime factors without factorising the number. This answer shows that it is possible to do, and shows a method that is more efficient than a naive method of factorisation. However, as noted in the comments, this proposed method is still not as efficient as factorising the number using a more advanced method.
Let k be the cube root of the number.
Check the number for all primes of size k or smaller and divide out any we find.
We now know that the resulting number is a product of primes larger than k, so it must either be 1, a single prime, or a product of 2 primes. (It cannot have more than 2 primes because k is the cube root of the number.)
We can detect whether it is a product of 2 primes by simply testing whether the number is a perfect square.
The results of this allow us to calculate the result in O(n^(1/3) / log(n)) assuming we have precomputed a list of primes.
Suppose we have the number 9409.
The cube root is 21.1 so we first check for divisibility by primes under 21.
None of them find a result so we compute the sqrt and find 9409== 97**2.
This means that the answer is 97.
Suppose we have the number 9797.
The cube root is 21.4 so we check for divisibility by primes under 21.
None of them find a result so we compute the sqrt and find 9797 is not a perfect square.
Therefore we conclude the answer is 9797. (Note that we have not determined the factorisation to work out this answer. In fact the factorisation is 97*101.)
Maple and Mathematica both calculate the squarefree kernel of a number by factoring and then multiplying back together just one copy of each prime (see so I doubt a better way is known.
Another approach is to start with the number itself. It is obviously a product of all its prime factors. You want to remove all the factors with power more than one. Hence, you don't mind if the number has a factor 2, but you do mind if it has a factor 4 (2^2). We can solve the problem by removing the extra factors.
Simple pseudocode:
method removeHigherPrimePowers(number)
temp <- number
primes <- [2, 3, 5, 7 ...]
for each p in primes
factor <- p * p // factor = 4, 9, 25, ...
while (temp MOD factor = 0)
temp <- temp / p // Remove extra factor of p
return temp
The number is being factorised, but the factorising is somewhat hidden. All those MOD statements do the same work. All that is being saved is a certain amount of accounting, keeping track of the factors found so far and multiplying them all together at the end.
As Peter says, you can test all primes up to the cube root, and then check if the number remaining is square.

Generate all subset sums within a range faster than O((k+N) * 2^(N/2))?

Is there a way to generate all of the subset sums s1, s2, ..., sk that fall in a range [A,B] faster than O((k+N)*2N/2), where k is the number of sums there are in [A,B]? Note that k is only known after we have enumerated all subset sums within [A,B].
I'm currently using a modified Horowitz-Sahni algorithm. For example, I first call it to for the smallest sum greater than or equal to A, giving me s1. Then I call it again for the next smallest sum greater than s1, giving me s2. Repeat this until we find a sum sk+1 greater than B. There is a lot of computation repeated between each iteration, even without rebuilding the initial two 2N/2 lists, so is there a way to do better?
In my problem, N is about 15, and the magnitude of the numbers is on the order of millions, so I haven't considered the dynamic programming route.
Check the subset sum on Wikipedia. As far as I know, it's the fastest known algorithm, which operates in O(2^(N/2)) time.
If you're looking for multiple possible sums, instead of just 0, you can save the end arrays and just iterate through them again (which is roughly an O(2^(n/2) operation) and save re-computing them. The value of all the possible subsets is doesn't change with the target.
Edit again:
I'm not wholly sure what you want. Are we running K searches for one independent value each, or looking for any subset that has a value in a specific range that is K wide? Or are you trying to approximate the second by using the first?
Edit in response:
Yes, you do get a lot of duplicate work even without rebuilding the list. But if you don't rebuild the list, that's not O(k * N * 2^(N/2)). Building the list is O(N * 2^(N/2)).
If you know A and B right now, you could begin iteration, and then simply not stop when you find the right answer (the bottom bound), but keep going until it goes out of range. That should be roughly the same as solving subset sum for just one solution, involving only +k more ops, and when you're done, you can ditch the list.
More edit:
You have a range of sums, from A to B. First, you solve subset sum problem for A. Then, you just keep iterating and storing the results, until you find the solution for B, at which point you stop. Now you have every sum between A and B in a single run, and it will only cost you one subset sum problem solve plus K operations for K values in the range A to B, which is linear and nice and fast.
s = *i + *j; if s > B then ++i; else if s < A then ++j; else { print s; ... what_goes_here? ... }
No, no, no. I get the source of your confusion now (I misread something), but it's still not as complex as what you had originally. If you want to find ALL combinations within the range, instead of one, you will just have to iterate over all combinations of both lists, which isn't too bad.
Excuse my use of auto. C++0x compiler.
std::vector<int> sums;
std::vector<int> firstlist;
std::vector<int> secondlist;
// Fill in first/secondlist.
std::sort(firstlist.begin(), firstlist.end());
std::sort(secondlist.begin(), secondlist.end());
auto firstit = firstlist.begin();
auto secondit = secondlist.begin();
// Since we want all in a range, rather than just the first, we need to check all combinations. Horowitz/Sahni is only designed to find one.
for(; firstit != firstlist.end(); firstit++) {
for(; secondit = secondlist.end(); secondit++) {
int sum = *firstit + *secondit;
if (sum > A && sum < B)
It's still not great. But it could be optimized if you know in advance that N is very large, for example, mapping or hashmapping sums to iterators, so that any given firstit can find any suitable partners in secondit, reducing the running time.
It is possible to do this in O(N*2^(N/2)), using ideas similar to Horowitz Sahni, but we try and do some optimizations to reduce the constants in the BigOh.
We do the following
Step 1: Split into sets of N/2, and generate all possible 2^(N/2) sets for each split. Call them S1 and S2. This we can do in O(2^(N/2)) (note: the N factor is missing here, due to an optimization we can do).
Step 2: Next sort the larger of S1 and S2 (say S1) in O(N*2^(N/2)) time (we optimize here by not sorting both).
Step 3: Find Subset sums in range [A,B] in S1 using binary search (as it is sorted).
Step 4: Next, for each sum in S2, find using binary search the sets in S1 whose union with this gives sum in range [A,B]. This is O(N*2^(N/2)). At the same time, find if that corresponding set in S2 is in the range [A,B]. The optimization here is to combine loops. Note: This gives you a representation of the sets (in terms of two indexes in S2), not the sets themselves. If you want all the sets, this becomes O(K + N*2^(N/2)), where K is the number of sets.
Further optimizations might be possible, for instance when sum from S2, is negative, we don't consider sums < A etc.
Since Steps 2,3,4 should be pretty clear, I will elaborate further on how to get Step 1 done in O(2^(N/2)) time.
For this, we use the concept of Gray Codes. Gray codes are a sequence of binary bit patterns in which each pattern differs from the previous pattern in exactly one bit.
Example: 00 -> 01 -> 11 -> 10 is a gray code with 2 bits.
There are gray codes which go through all possible N/2 bit numbers and these can be generated iteratively (see the wiki page I linked to), in O(1) time for each step (total O(2^(N/2)) steps), given the previous bit pattern, i.e. given current bit pattern, we can generate the next bit pattern in O(1) time.
This enables us to form all the subset sums, by using the previous sum and changing that by just adding or subtracting one number (corresponding to the differing bit position) to get the next sum.
If you modify the Horowitz-Sahni algorithm in the right way, then it's hardly slower than original Horowitz-Sahni. Recall that Horowitz-Sahni works two lists of subset sums: Sums of subsets in the left half of the original list, and sums of subsets in the right half. Call these two lists of sums L and R. To obtain subsets that sum to some fixed value A, you can sort R, and then look up a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to save a constant factor in space and time. It's a good idea for this problem to sort both L and R.
In my code below I also reverse L. Then you can keep two pointers into R, updated for each entry in L: A pointer to the last entry in R that's too low, and a pointer to the first entry in R that's too high. When you advance to the next entry in L, each pointer might either move forward or stay put, but they won't have to move backwards. Thus, the second stage of the Horowitz-Sahni algorithm only takes linear time in the data generated in the first stage, plus linear time in the length of the output. Up to a constant factor, you can't do better than that (once you have committed to this meet-in-the-middle algorithm).
Here is a Python code with example input:
# Input
terms = [29371, 108810, 124019, 267363, 298330, 368607,
438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760]
(A,B) = (500000,600000)
# Subset iterator stolen from Sage
def subsets(X):
yield []; pairs = []
for x in X:
for w in xrange(2**(len(pairs)-1), 2**(len(pairs))):
yield [x for m, x in pairs if m & w]
# Modified Horowitz-Sahni with toolow and toohigh indices
L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])])
R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])])
(toolow,toohigh) = (-1,0)
for (Lsum,S) in reversed(L):
while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1
while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1
for n in xrange(toolow+1,toohigh):
print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
"Moron" (I think he should change his user name) raises the reasonable issue of optimizing the algorithm a little further by skipping one of the sorts. Actually, because each list L and R is a list of sizes of subsets, you can do a combined generate and sort of each one in linear time! (That is, linear in the lengths of the lists.) L is the union of two lists of sums, those that include the first term, term[0], and those that don't. So actually you should just make one of these halves in sorted form, add a constant, and then do a merge of the two sorted lists. If you apply this idea recursively, you save a logarithmic factor in the time to make a sorted L, i.e., a factor of N in the original variable of the problem. This gives a good reason to sort both lists as you generate them. If you only sort one list, you have some binary searches that could reintroduce that factor of N; at best you have to optimize them somehow.
At first glance, a factor of O(N) could still be there for a different reason: If you want not just the subset sum, but the subset that makes the sum, then it looks like O(N) time and space to store each subset in L and in R. However, there is a data-sharing trick that also gets rid of that factor of O(N). The first step of the trick is to store each subset of the left or right half as a linked list of bits (1 if a term is included, 0 if it is not included). Then, when the list L is doubled in size as in the previous paragraph, the two linked lists for a subset and its partner can be shared, except at the head:
1 -> 1 -> 0 -> ...
Actually, this linked list trick is an artifact of the cost model and never truly helpful. Because, in order to have pointers in a RAM architecture with O(1) cost, you have to define data words with O(log(memory)) bits. But if you have data words of this size, you might as well store each word as a single bit vector rather than with this pointer structure. I.e., if you need less than a gigaword of memory, then you can store each subset in a 32-bit word. If you need more than a gigaword, then you have a 64-bit architecture or an emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you patch the RAM cost model to take account of word size, then this factor of N was never really there anyway.
So, interestingly, the time complexity for the original Horowitz-Sahni algorithm isn't O(N*2^(N/2)), it's O(2^(N/2)). Likewise the time complexity for this problem is O(K+2^(N/2)), where K is the length of the output.
