why is integer factorization a non-polynomial time? - performance

I am just a beginner of computer science. I learned something about running time but I can't be sure what I understood is right. So please help me.
So integer factorization is currently not a polynomial time problem but primality test is. Assume the number to be checked is n. If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number. I think this program is polynomial time, isn't it?
One possible way that I am wrong would be a factorization program should find all primes, instead of the first prime discovered. So maybe this is the reason why.
However, in public key cryptography, finding a prime factor of a large number is essential to attack the cryptography. Since usually a large number (public key) is only the product of two primes, finding one prime means finding the other. This should be polynomial time. So why is it difficult or impossible to attack?

Casual descriptions of complexity like "polynomial factoring algorithm" generally refer to the complexity with respect to the size of the input, not the interpretation of the input. So when people say "no known polynomial factoring algorithm", they mean there is no known algorithm for factoring N-bit natural numbers that runs in time polynomial with respect to N. Not polynomial with respect to the number itself, which can be up to 2^N.

The difficulty of factorization is one of those beautiful mathematical problems that's simple to understand and takes you immediately to the edge of human knowledge. To summarize (today's) knowledge on the subject: we don't know why it's hard, not with any degree of proof, and the best methods we have run in more than polynomial time (but also significantly less that exponential time). The result that primality testing is even in P is pretty recent; see the linked Wikipedia page.
The best heuristic explanation I know for the difficulty is that primes are randomly distributed. One of the easier-to-understand results is Dirichlet's theorem. This theorem say that every arithmetic progression contains infinitely many primes, in other words, you can think of primes as being dense with respect to progressions, meaning you can't avoid running into them. This is the simplest of a rather large collection of such results; in all of them, primes appear in ways very much analogous to random numbers.
The difficult of factoring is thus analogous to the impossibility of reversing a one-time pad. In a one-time pad, there's a bit we don't know XOR with another one we don't. We get zero information about an individual bit knowing the result of the XOR. Replace "bit" with "prime" and multiplication with XOR, and you have the factoring problem. It's as if you've multiplied two random numbers together, and you get very little information from product (instead of zero information).

If we run a program just to decide whether every number from 1 to sqrt(n) can divide n, and if the answer is yes, then store the number.
Even ignoring that the divisibility test will take longer for bigger numbers, this approach takes almost twice as long if you just add a single (binary) digit to n. (Actually it will take twice as long if you add two digits)
I think that is the definition of exponential runtime: Make n one bit longer, the algorithm takes twice as long.
But note that this observation applies only to the algorithm you proposed. It is still unknown if integer factorization is polynomial or not. The cryptographers sure hope that it is not, but there are also alternative algorithms that do not depend on prime factorization being hard (such as elliptic curve cryptography), just in case...

Related

Big Theta of Factorial Multiplied by a Coefficient

For a function with a run time of (cn)! where c is a coefficient >= 0 and c != n, would the tight bound of the run be Θ(n!) or Θ((cn)!)? Right now, I believe it would be Θ((cn)!) since they would differ by a coefficent >= n since cn != n.
Thanks!
Edit: A more specific example to clarify what I'm asking:
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
You can use Stirling's approximation to get that if c>1 then (cn)! is asymptotically larger than pow(c,n)*n!, which is not O(n!) since the quotient diverges. As a more elementary approach consider this example for c=2: (2n)!=(2n)(2n-1)...(n+1)n!>n!n! and (n!n!)/n!=n! diverges, so (2n)! is NOT O(n!).
Will (7n)!, (5n/16)! and n! all be Θ(n!)?
I think there are two answers to your question.
The shorter one is from the purely theoretical point of view. Of those 3 only the n! lies in the class of Θ(n!). The second lies in the O(n!) (note big-O instead of big-Theta) and (7n)! is slower than Θ(n!), it lies in Θ((7n)!)
There is also a longer but more practical answer. And to get to it we first need to understand what is the big deal with this whole big-O and big-Theta business in the first place?
The thing is that for many practical tasks there are many algorithms and not all of them are equally or even similarly efficient. So the practical question is: can we somehow capture this difference in performance in an easy to understand and compare way? And this is the problem that big-O/big-Theta are trying to solve. The idea behind this method is that if we look at some algorithm with some complicated real formula for the exact time, there is only 1 term that grows faster than all others and thus dominates the time as the problem gets bigger. So let's compress this big formula to that dominant term. Then we can compare those terms and if they are different, we can easily say which is the better algorithm (7*n^2 is clearly better than 2*n^3).
Another idea is that the term "operation" is usually not that well defined at the level people usually think about algorithms. Which "operation" actually maps to a single CPU instruction and which to a few depends on many factors such as particular hardware. Also the instructions themselves can take different time to execute. Moreover sometimes the algorithm's working time is dominated by memory access than CPU instructions and those components are not easily additive. The morale of this story is that if two algorithms are different only in a scalar coefficient, you can't really compare those algorithms just theoretically. You need to compare some implementations in some particular environment. This is why algorithms complexity measure typically boils down to something like O(n^k) where k is a constant.
There is one more consideration: practicality. If the algorithm is some polynomial, there is a huge practical difference between cases a=3 and a=4 in O(n^a). But if it is something like O(2^(n^a)), then there is not much difference what exactly the a as along as a>1. This is because 2^n grows fast enough to make it impractical for almost any realistic n irrespective of a. So in practical terms it is often good enough approximation to put all such algorithms into a single "exponential algorithms" bucket and say they are all impractical even despite the fact there is a huge difference between them. This is where some mathematically unconventional notations like 2^O(n) come from.
From this last practical perspective the difference between Θ(n!) and Θ((7n)!) is also very little: both are totally impractical because both lie beyond even the exponential bucket of 2^O(n) (see Stirling's formula that shows that n! grows a bit faster than (n/e)^n). So it makes sense to put all such algorithms in another bucket of "factorial complexity" and mark them as impractical as well.

What is the most efficient algorithm to give out prime numbers, up to very high values (all a 32bit machine can handle)

My program is supposed to loop forever and give out via print every prime number it comes along. Doing this in x86-NASM btw.
My first attempt divided it by EVERY previous number until either the Carry is 0 (not a prime) or the result is 1.
MY second attempt improved this by only testing every second, so only odd numbers.
The third thing I am currently implementing is trying to not divide by EVERY previous number but rather all of the previous divided by 2, since you can't get an even number by dividing a number by something bigger than its half
Another thing that might help is to test it with only odd numbers, like the sieve of eratosthenes, but only excluding even numbers.
Anyway, if there is another thing I can do, all help welcome.
edit:
If you need to test an handful, possibly only one, of primes, the AKS primality test is polynomial in the length of n.
If you want to find a very big prime, of cryptographic size, then select a random range of odd numbers and sieve out all the numbers whose factors are small primes (e.g. less equal than 64K-240K) then test the remaining numbers for primality.
If you want to find the primes in a range then use a sieve, the sieve of Erathostenes is very easy to implement but run slower and require more memory.
The sieve of Atkin is faster, the wheels sieve requires far less memory.
The size of the problem is exponential if approached naively so before micro-optimising is mandatory to first macro-optimise.
More or less all prime numbers algorithms require confidence with Number theory, so take particular attention to the group/ring/field the algorithm is working on because mathematicians write operations like the inverse or the multiplication with the same symbol for all the algebraic structures.
Once you have a fast algorithm, you can start micro-optimising.
At this level it's really impossible to answer how to proceed with such optimisations.

Simple deterministic primality testing for small numbers

I am aware that there are a number of primality testing algorithms used in practice (Sieve of Eratosthenes, Fermat's test, Miller-Rabin, AKS, etc). However, they are either slow (e.g. sieve), probabalistic (Fermat and Miller-Rabin), or relatively difficult to implement (AKS).
What is the best deterministic solution to determine whether or not a number is prime?
Note that I am primarily (pun intended) interested in testing against numbers on the order of 32 (and maybe 64) bits. So a robust solution (applicable to larger numbers) is not necessary.
Up to ~2^30 you could brute force with trial-division.
Up to 3.4*10^14, Rabin-Miller with the first 7 primes has been proven to be deterministic.
Above that, you're on your own. There's no known sub-cubic deterministic algorithm.
EDIT : I remembered this, but I didn't find the reference until now:
http://reference.wolfram.com/legacy/v5_2/book/section-A.9.4
PrimeQ first tests for divisibility using small primes, then uses the
Miller-Rabin strong pseudoprime test base 2 and base 3, and then uses
a Lucas test.
As of 1997, this procedure is known to be correct only for n < 10^16,
and it is conceivable that for larger n it could claim a composite
number to be prime.
So if you implement Rabin-Miller and Lucas, you're good up to 10^16.
If I didn't care about space, I would try precomputing all the primes below 2^32 (~4e9/ln(4e9)*4 bytes, which is less than 1GB), store them in the memory and use a binary search. You can also play with memory mapping of the file containing these precomputed primes (pros: faster program start, cons: will be slow until all the needed data is actually in the memory).
If you can factor n-1 it is easy to prove that n is prime, using a method developed by Edouard Lucas in the 19th century. You can read about the algorithm at Wikipedia, or look at my implementation of the algorithm at my blog. There are variants of the algorithm that require only a partial factorization.
If the factorization of n-1 is difficult, the best method is the elliptic curve primality proving algorithm, but that requires more math, and more code, than you may be willing to write. That would be much faster than AKS, in any case.
Are you sure that you need an absolute proof of primality? The Baillie-Wagstaff algorithm is faster than any deterministic primality prover, and there are no known counter-examples.
If you know that n will never exceed 2^64 then strong pseudo-prime tests using the first twelve primes as bases are sufficient to prove n prime. For 32-bit integers, strong pseudo-prime tests to the three bases 2, 7 and 61 are sufficient to prove primality.
Use the Sieve of Eratosthenes to pre-calculate as many primes as you have space for. You can fit in a lot at one bit per number and halve the space by only sieving odd numbers (treating 2 as a special case).
For numbers from Sieve.MAX_NUM up to the square of Sieve.MAX_NUM you can use trial division because you already have the required primes listed. Judicious use of Miller-Rabin on larger unfactored residues can speed up the process a lot.
For numbers larger than that I would use one of the probabilistic tests, Miller-Rabin is good and if repeated a few times can give results that are less likely to be wrong than a hardware failure in the computer you are running.

What is the most efficient algorithm to find the closest prime less than a given number n?

Problem
Given a number n, 2<=n<=2^63. n could be prime itself. Find the prime p that is closest to n.
Using the fact that for all primes p, p>2, p is odd and p is of the form 6k+1 or 6k+5, one could write a loop from n−1 to 2 to check if that number is prime. So instead of checking for all numbers I need to check for every odd of the two forms above. However, I wonder if there is a faster algorithm to solve this problem? i.e. some constraints that can restrict the range of numbers need to be checked? Any idea would be greatly appreciated.
In reality, the odds of finding a prime number are "high" so brute force checking while skipping "trivial" numbers (numbers divisible by small primes) is going to be your best approach given what we know about number theory to date.
[update] A mild optimization that you might do is similar to the Sieve of Eratosthenes where you define some small smooth bound and mark all numbers in a range about N as being composite and only test the numbers relatively prime to your smooth base. You will need to make your range and smoothness small enough as to not eclipse the runtime of the comparatively "expense" prime test.
The biggest optimization that you can do is to use a fast primality check before doing a full test. For instance see http://en.wikipedia.org/wiki/Miller%E2%80%93Rabin_primality_test for a commonly used test that will quickly eliminate most numbers as "probably not prime". Only after you have good reason to believe that a number is prime should you attempt to properly prove primality. (For many purposes people are happy to just accept that if it passes a fixed number of trials of the Rabin-Miller test, it is so likely to be prime that you can just accept that fact.)

Is there such a thing as "negative" big-O complexity? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Are there any O(1/n) algorithms?
This just popped in my head for no particular reason, and I suppose it's a strange question. Are there any known algorithms or problems which actually get easier or faster to solve with larger input? I'm guessing that if there are, it wouldn't be for things like mutations or sorting, it would be for decision problems. Perhaps there's some problem where having a ton of input makes it easy to decide something, but I can't imagine what.
If there is no such thing as negative complexity, is there a proof that there cannot be? Or is it just that no one has found it yet?
No that is not possible. Since Big-Oh is suppose to be an approximation of the number of operations an algorithm performs related to its domain size then it would not make sense to describe an algorithm as using a negative number of operations.
The formal definition section of the wikipedia article actually defines the Big-Oh notation in terms of using positive real numbers. So there actually is not even a proof because the whole concept of Big-Oh has no meaning on the negative real numbers per the formal definition.
Short answer: Its not possible because the definition says so.
update
Just to make it clear, I'm answering this part of the question: Are there any known algorithms or problems which actually get easier or faster to solve with larger input?
As noted in accepted answer here, there are no algorithms working faster with bigger input.
Are there any O(1/n) algorithms?
Even an algorithm like sleep(1/n) has to spend time reading its input, so its running time has a lower bound.
In particular, author referes relatively simple substring search algorithm:
http://en.wikipedia.org/wiki/Horspool
PS But using term 'negative complexity' for such algorithms doesn't seem to be reasonable to me.
To think in an algorithm that executes in negative time, is the same as thinking about time going backwards.
If the program starts executing at 10:30 AM and stops at 10:00 AM without passing through 11:00 AM, it has just executed with time = O(-1).
=]
Now, for the mathematical part:
If you can't come up with a sequence of actions that execute backwards in time (you never know...lol), the proof is quite simple:
positiveTime = O(-1) means:
positiveTime <= c * -1, for any C > 0 and n > n0 > 0
Consider the "C > 0" restriction.
We can't find a positive number that multiplied by -1 will result in another positive number.
By taking that in account, this is the result:
positiveTime <= negativeNumber, for any n > n0 > 0
Wich just proves that you can't have an algorithm with O(-1).
Not really. O(1) is the best you can hope for.
The closest I can think of is language translation, which uses large datasets of phrases in the target language to match up smaller snippets from the source language. The larger the dataset, the better (and to a certain extent faster) the translation. But that's still not even O(1).
Well, for many calculations like "given input A return f(A)" you can "cache" calculation results (store them in array or map), which will make calculation faster with larger number of values, IF some of those values repeat.
But I don't think it qualifies as "negative complexity". In this case fastest performance will probably count as O(1), worst case performance will be O(N), and average performance will be somewhere inbetween.
This is somewhat applicable for sorting algorithms - some of them have O(N) best-case scenario complexity and O(N^2) worst case complexity, depending on the state of data to be sorted.
I think that to have negative complexity, algorithm should return result before it has been asked to calculate result. I.e. it should be connected to a time machine and should be able to deal with corresponding "grandfather paradox".
As with the other question about the empty algorithm, this question is a matter of definition rather than a matter of what is possible or impossible. It is certainly possible to think of a cost model for which an algorithm takes O(1/n) time. (That is not negative of course, but rather decreasing with larger input.) The algorithm can do something like sleep(1/n) as one of the other answers suggested. It is true that the cost model breaks down as n is sent to infinity, but n never is sent to infinity; every cost model breaks down eventually anyway. Saying that sleep(1/n) takes O(1/n) time could be very reasonable for an input size ranging from 1 byte to 1 gigabyte. That's a very wide range for any time complexity formula to be applicable.
On the other hand, the simplest, most standard definition of time complexity uses unit time steps. It is impossible for a positive, integer-valued function to have decreasing asymptotics; the smallest it can be is O(1).
I don't know if this quite fits but it reminds me of bittorrent. The more people downloading a file, the faster it goes for all of them

Resources