Practical Prime Factorization - algorithm

I've read about factorization of integers into the prime factors and did a proof of concept implementation of Pollard's rho algorithm:
https://en.wikipedia.org/wiki/Pollard%27s_rho_algorithm
The algorithm is easy to implement, works so far. However it may (and does) fail for certain numbers. The Wikipedia page suggest to restart the algorithm using a different start condition or pick a random generator function.
That does not sound very deterministic to me. Is there an approach that guarantees that the algorithm will eventually terminate?
I know that the state of the art is the elliptic curve integer factorization, and I plan to implement it for laughs and giggles later. To do so I first need a prime factorization algorithm for small numbers though, that is why I have to deal with something like Pollards rho algorithm first.

Related

Difference between a stochastic and a heuristic algorithm

Extending the question of streetparade, I would like to ask what is the difference, if any, between a stochastic and a heuristic algorithm.
Would it be right to say that a stochastic algorithm is actually one type of heuristic?
TTBOMK, "stochastic algorithm" is not a standard term. "Randomized algorithm" is, however, and it's probably what is meant here.
Randomized: Uses randomness somehow. There are two flavours: Monte Carlo algorithms always finish in bounded time, but don't guarantee an optimal solution, while Las Vegas algorithms aren't necessarily guaranteed to finish in any finite time, but promise to find the optimal solution. (Usually they are also required to have a finite expected running time.) Examples of common Monte Carlo algorithms: MCMC, simulated annealing, and Miller-Rabin primality testing. Quicksort with randomized pivot choice is a Las Vegas algorithm that always finishes in finite time. An algorithm that does not use any randomness is deterministic.
Heuristic: Not guaranteed to find the correct answer. An algorithm that is not heuristic is exact.
Many heuristics are sensitive to "incidental" properties of the input that don't affect the true solution, such as the order items are considered in the First-Fit heuristic for the Bin Packing problem. In this case they can be thought of as Monte Carlo randomized algorithms: you can randomly permute the inputs and rerun them, always keeping the best answer you find. OTOH, other heuristics don't have this property -- e.g. the First-Fit-Decreasing heuristic is deterministic, since it always first sorts the items in decreasing size order.
If the set of possible outputs of a particular randomized algorithm is finite and contains the true answer, then running it long enough is "practically guaranteed" to eventually find it (in the sense that the probability of not finding it can be made arbitrarily small, but never 0). Note that it's not automatically the case that some permutation of the inputs to a heuristic will result in getting the exact answer -- in the case of First-Fit, it turns out that this is true, but this was only proven in 2009.
Sometimes stronger statements about convergence of randomized algorithms can be made: these are usually along the lines of "For any given small threshold d, after t steps we will be within d of the optimal solution with probability f(t, d)", with f(t, d) an increasing function of t and d.
Booth approaches are usually used to speed up genere and test solutions to NP complete problems
Stochastic algorithms use randomness
They use all combinations but not in order but instead they use random ones from the whole range of possibilities hoping to hit the solution sooner. Implementation is fast easy and single iteration is also fast (constant time)
Heuristics algorithms
They pick up the combinations not randomly but based on some knowledge on used process, input dataset, or usage instead. So they lower the number of combinations significantly to only those they are probably the solution and use only those but usually all of them until solution is found.
Implementation complexity depends on the problem, single iteration is usually much much slower then stochastic approach (constant time) so heuristics is used only if the number of possibilities is lowered enough to actual speed up is visible because even if algorithm complexity with heuristic is usually much lower sometimes the constant time is big enough to even slow things down ... (in runtime terms)
Booth approaches can be combined together

Where is the Sieve of Eratosthenes used today?

I'm doing a research paper on the topic and while I find a lot of examples and discussion about how the algorithm works/should be implemented, I can't find anything on where it's actually used.
Is there any field in which the algorithm is used today? Or do people just implement it for "shits 'n giggles" (it's fairly simple, so that would make some sense)?
I know that large prime numbers are important in the field of encryption, but I doubt the sieve is used to find/generate those primes. Also, the huge amount of memory needed to find large primes makes it inefficient for those, too.
So is the algorithm, in any form, used anywhere today?
According to the Wikipedia article on the subject, that particular sieve is still a very efficient method for producing the full list of primes whose value is less than a few millions. Also, the general idea of a sieve is used in several other, more powerful algorithms, such as the General number field sieve for factoring large integers.
You can view a prime sieve as an application of dynamic programming to small complete prime number enumeration and testing. So your question is really "what do we need prime numbers for?". They are a fundamental part of number theory. As one example encoding an integer into its prime factorization has all sorts of useful properties and higher-level utility. By adding backtracking to a sieve we can perform this factorization very quickly.

why is integer factorization a non-polynomial time?

I am just a beginner of computer science. I learned something about running time but I can't be sure what I understood is right. So please help me.
So integer factorization is currently not a polynomial time problem but primality test is. Assume the number to be checked is n. If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number. I think this program is polynomial time, isn't it?
One possible way that I am wrong would be a factorization program should find all primes, instead of the first prime discovered. So maybe this is the reason why.
However, in public key cryptography, finding a prime factor of a large number is essential to attack the cryptography. Since usually a large number (public key) is only the product of two primes, finding one prime means finding the other. This should be polynomial time. So why is it difficult or impossible to attack?
Casual descriptions of complexity like "polynomial factoring algorithm" generally refer to the complexity with respect to the size of the input, not the interpretation of the input. So when people say "no known polynomial factoring algorithm", they mean there is no known algorithm for factoring N-bit natural numbers that runs in time polynomial with respect to N. Not polynomial with respect to the number itself, which can be up to 2^N.
The difficulty of factorization is one of those beautiful mathematical problems that's simple to understand and takes you immediately to the edge of human knowledge. To summarize (today's) knowledge on the subject: we don't know why it's hard, not with any degree of proof, and the best methods we have run in more than polynomial time (but also significantly less that exponential time). The result that primality testing is even in P is pretty recent; see the linked Wikipedia page.
The best heuristic explanation I know for the difficulty is that primes are randomly distributed. One of the easier-to-understand results is Dirichlet's theorem. This theorem say that every arithmetic progression contains infinitely many primes, in other words, you can think of primes as being dense with respect to progressions, meaning you can't avoid running into them. This is the simplest of a rather large collection of such results; in all of them, primes appear in ways very much analogous to random numbers.
The difficult of factoring is thus analogous to the impossibility of reversing a one-time pad. In a one-time pad, there's a bit we don't know XOR with another one we don't. We get zero information about an individual bit knowing the result of the XOR. Replace "bit" with "prime" and multiplication with XOR, and you have the factoring problem. It's as if you've multiplied two random numbers together, and you get very little information from product (instead of zero information).
If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number.
Even ignoring that the divisibility test will take longer for bigger numbers, this approach takes almost twice as long if you just add a single (binary) digit to n. (Actually it will take twice as long if you add two digits)
I think that is the definition of exponential runtime: Make n one bit longer, the algorithm takes twice as long.
But note that this observation applies only to the algorithm you proposed. It is still unknown if integer factorization is polynomial or not. The cryptographers sure hope that it is not, but there are also alternative algorithms that do not depend on prime factorization being hard (such as elliptic curve cryptography), just in case...

Fastest and most reliable factorization method

Which is the fastest and most reliable factorization method used now a days ? I have gone thru
Fermat's Factorization and Pollard's rho factorization method and was wondering are there any better methods to code and implement ?
Please check the Wikipedia article. It has almost everything you want to find: http://en.wikipedia.org/wiki/Integer_factorization
The solution really depends on the range of the number, and sometimes the property of the number.
For big number around or less than 100 digits, according to Wikipedia, quadratic sieve is the best. For larger numbers, general number field sieve is better.
I don't talk about small cases, as you are already mentioning Pollard's rho, this should be trivial.

Simple deterministic primality testing for small numbers

I am aware that there are a number of primality testing algorithms used in practice (Sieve of Eratosthenes, Fermat's test, Miller-Rabin, AKS, etc). However, they are either slow (e.g. sieve), probabalistic (Fermat and Miller-Rabin), or relatively difficult to implement (AKS).
What is the best deterministic solution to determine whether or not a number is prime?
Note that I am primarily (pun intended) interested in testing against numbers on the order of 32 (and maybe 64) bits. So a robust solution (applicable to larger numbers) is not necessary.
Up to ~2^30 you could brute force with trial-division.
Up to 3.4*10^14, Rabin-Miller with the first 7 primes has been proven to be deterministic.
Above that, you're on your own. There's no known sub-cubic deterministic algorithm.
EDIT : I remembered this, but I didn't find the reference until now:
http://reference.wolfram.com/legacy/v5_2/book/section-A.9.4
PrimeQ first tests for divisibility using small primes, then uses the
Miller-Rabin strong pseudoprime test base 2 and base 3, and then uses
a Lucas test.
As of 1997, this procedure is known to be correct only for n < 10^16,
and it is conceivable that for larger n it could claim a composite
number to be prime.
So if you implement Rabin-Miller and Lucas, you're good up to 10^16.
If I didn't care about space, I would try precomputing all the primes below 2^32 (~4e9/ln(4e9)*4 bytes, which is less than 1GB), store them in the memory and use a binary search. You can also play with memory mapping of the file containing these precomputed primes (pros: faster program start, cons: will be slow until all the needed data is actually in the memory).
If you can factor n-1 it is easy to prove that n is prime, using a method developed by Edouard Lucas in the 19th century. You can read about the algorithm at Wikipedia, or look at my implementation of the algorithm at my blog. There are variants of the algorithm that require only a partial factorization.
If the factorization of n-1 is difficult, the best method is the elliptic curve primality proving algorithm, but that requires more math, and more code, than you may be willing to write. That would be much faster than AKS, in any case.
Are you sure that you need an absolute proof of primality? The Baillie-Wagstaff algorithm is faster than any deterministic primality prover, and there are no known counter-examples.
If you know that n will never exceed 2^64 then strong pseudo-prime tests using the first twelve primes as bases are sufficient to prove n prime. For 32-bit integers, strong pseudo-prime tests to the three bases 2, 7 and 61 are sufficient to prove primality.
Use the Sieve of Eratosthenes to pre-calculate as many primes as you have space for. You can fit in a lot at one bit per number and halve the space by only sieving odd numbers (treating 2 as a special case).
For numbers from Sieve.MAX_NUM up to the square of Sieve.MAX_NUM you can use trial division because you already have the required primes listed. Judicious use of Miller-Rabin on larger unfactored residues can speed up the process a lot.
For numbers larger than that I would use one of the probabilistic tests, Miller-Rabin is good and if repeated a few times can give results that are less likely to be wrong than a hardware failure in the computer you are running.

Resources