recurrence using (mod 2^32+1) - algorithm

Where m = 2^32+1 = 641*6700417 the mod function is little more than a single subtract on 32-bit processors. I don’t care that the recurrence
Seed = Seed*a%m
is not a good random number generator. I wish to use it in an encryption algorithm as a 32-bit wide sbox. Is there an algorithm that would return true if a trial value of “a” would cause the recurrence to visit all 2^32 values?
Assuming that such an algorithm exits I suspect that if a*b%m = 1 then the recurrence using “b” would run backwards. Is what I suspect true. I would use “b” to implement the inverse sbox.
I can do everything I ask using mod (2^16+1) but that number is prime.

Is there an algorithm that would return true if a trial value of “a” would cause the recurrence to visit all 232 values?
Yes there is:
return false;
The most obvious reason is that the set of all 232 possible values includes the value zero, and there the recurrence gets stuck, so it isn't cyclic. But even if you exclude zero, if you start with a multiple of 641, then you will only ever visit multiples of 641, and the same holds for the other factor.
This kind of “visit all values” property only works if you reduce modulo some prime, and if you exclude zero.

This is not a very simple question. It is easy to answer that in your case the numbers you use are poor. However, the easy answer: use primes is not the correct answer, either.
If we use the recurrence:
r := (r a) mod m
it is easy to see, that this gives the maximal cycle (m-1) if and only if all a^i mod m give different numbers for i = 0 .. m-2. However, this does not automatically happen even if both a and m are primes.
An example of two primes which cannot be used: a = 13, m = 17, because 13^4 mod 17 == 1, and the cycle will be very short (4 steps).
So, we need some other requirements, as well. To make a very long story short, a generator of this type (multiplicative congruential generator) produces the maximal cycle (m-1) if:
m is prime
a is a primitive root of m
Unfortunately, the latter requirement is a bit difficult, as there is no general formula to find primitive roots. (And please note that a does not have to be a prime, for example the combination m=17 and a=10 gives the full cycle.)
So, despite this being a seemingly simple problem, it touches some rather fundamental aspects of number theory.

Related

Generating Diffie-hellman parameters (generator)

I'm trying to implement a diffie-hellman key exchange. Let's say I found a large prime number p - how can I find a generator g?
Restricted by the multiprecision library that I have to use, only a few basic operations (+, *, -, /, pow, modExp, modMult, mod, gcd, isPrime, genRandomPrime, genRandomBits, and a few more) are available.
Would it work to look for a safe prime q, so that every number n for which gcd(n,q) == 1 should be a generator, right?
You basically answered your question. Just the test gcd(n,q)==1 is not necessary since q is prime. It means that any number n, such that n < q does not have common factor with q and gcd(n,q) will always output 1.
You can check whether q=2p + 1 is prime number. If so, then ord(Zq) = q-1 = (2p+1)-1 = 2p. Since ord(x) | ord(Zq) for every x in Zq ord(x)=2 or ord(x)=p or ord(x)=2p. Thus you just need to check whether your randomly chosen element x from {2,...,q-1} is of order 2. If not then it is of order p or 2p and you can use it as a generator.
As a rule, don't ask programmers questions about cryptography. Cryptography is subtle and, as a result, difficult in invisible ways that lead readily to self-deception about one's own competence. Instead, ask cryptographers (many of which are also programmers). Stack Exchange has a cryptography board, where this question has already been answered.
https://crypto.stackexchange.com/questions/29926/what-diffie-hellman-parameters-should-i-use
I could quibble with the advice there, but it's basically sound. Unless you really want to learn the relevant mathematics, I'd defer to authorities; they're cited in the answer above.
As to the mathematics question you ask, here's a tiny introduction. The multiplicative group modulo a prime p has size p-1. (See Fermat's Little Theorem.) The order of any element must divide p-1. The most favorable case is where p-1=2q, where q is also prime.
You've already gotten the ritual admonishment not to roll your own crypto if you care at all about your security, so here's how to find a generator mod a safe prime q. A number g in the closed range [2, q - 2] is a generator if and only if g^((q-1)/2) != 1 mod q, which you should compute with the standard algorithm for modular exponentiation. Choose random values of g until one passes the test.

Create a random permutation of 1..N in constant space

I am looking to enumerate a random permutation of the numbers 1..N in fixed space. This means that I cannot store all numbers in a list. The reason for that is that N can be very large, more than available memory. I still want to be able to walk through such a permutation of numbers one at a time, visiting each number exactly once.
I know this can be done for certain N: Many random number generators cycle through their whole state space randomly, but entirely. A good random number generator with state size of 32 bit will emit a permutation of the numbers 0..(2^32)-1. Every number exactly once.
I want to get to pick N to be any number at all and not be constrained to powers of 2 for example. Is there an algorithm for this?
The easiest way is probably to just create a full-range PRNG for a larger range than you care about, and when it generates a number larger than you want, just throw it away and get the next one.
Another possibility that's pretty much a variation of the same would be to use a linear feedback shift register (LFSR) to generate the numbers in the first place. This has a couple of advantages: first of all, an LFSR is probably a bit faster than most PRNGs. Second, it is (I believe) a bit easier to engineer an LFSR that produces numbers close to the range you want, and still be sure it cycles through the numbers in its range in (pseudo)random order, without any repetitions.
Without spending a lot of time on the details, the math behind LFSRs has been studied quite thoroughly. Producing one that runs through all the numbers in its range without repetition simply requires choosing a set of "taps" that correspond to an irreducible polynomial. If you don't want to search for that yourself, it's pretty easy to find tables of known ones for almost any reasonable size (e.g., doing a quick look, the wikipedia article lists them for size up to 19 bits).
If memory serves, there's at least one irreducible polynomial of ever possible bit size. That translates to the fact that in the worst case you can create a generator that has roughly twice the range you need, so on average you're throwing away (roughly) every other number you generate. Given the speed an LFSR, I'd guess you can do that and still maintain quite acceptable speed.
One way to do it would be
Find a prime p larger than N, preferably not much larger.
Find a primitive root of unity g modulo p, that is, a number 1 < g < p such that g^k ≡ 1 (mod p) if and only if k is a multiple of p-1.
Go through g^k (mod p) for k = 1, 2, ..., ignoring the values that are larger than N.
For every prime p, there are φ(p-1) primitive roots of unity, so it works. However, it may take a while to find one. Finding a suitable prime is much easier in general.
For finding a primitive root, I know nothing substantially better than trial and error, but one can increase the probability of a fast find by choosing the prime p appropriately.
Since the number of primitive roots is φ(p-1), if one randomly chooses r in the range from 1 to p-1, the expected number of tries until one finds a primitive root is (p-1)/φ(p-1), hence one should choose p so that φ(p-1) is relatively large, that means that p-1 must have few distinct prime divisors (and preferably only large ones, except for the factor 2).
Instead of randomly choosing, one can also try in sequence whether 2, 3, 5, 6, 7, 10, ... is a primitive root, of course skipping perfect powers (or not, they are in general quickly eliminated), that should not affect the number of tries needed greatly.
So it boils down to checking whether a number x is a primitive root modulo p. If p-1 = q^a * r^b * s^c * ... with distinct primes q, r, s, ..., x is a primitive root if and only if
x^((p-1)/q) % p != 1
x^((p-1)/r) % p != 1
x^((p-1)/s) % p != 1
...
thus one needs a decent modular exponentiation (exponentiation by repeated squaring lends itself well for that, reducing by the modulus on each step). And a good method to find the prime factor decomposition of p-1. Note, however, that even naive trial division would be only O(√p), while the generation of the permutation is Θ(p), so it's not paramount that the factorisation is optimal.
Another way to do this is with a block cipher; see this blog post for details.
The blog posts links to the paper Ciphers with Arbitrary Finite Domains which contains a bunch of solutions.
Consider the prime 3. To fully express all possible outputs, think of it this way...
bias + step mod prime
The bias is just an offset bias. step is an accumulator (if it's 1 for example, it would just be 0, 1, 2 in sequence, while 2 would result in 0, 2, 4) and prime is the prime number we want to generate the permutations against.
For example. A simple sequence of 0, 1, 2 would be...
0 + 0 mod 3 = 0
0 + 1 mod 3 = 1
0 + 2 mod 3 = 2
Modifying a couple of those variables for a second, we'll take bias of 1 and step of 2 (just for illustration)...
1 + 2 mod 3 = 0
1 + 4 mod 3 = 2
1 + 6 mod 3 = 1
You'll note that we produced an entirely different sequence. No number within the set repeats itself and all numbers are represented (it's bijective). Each unique combination of offset and bias will result in one of prime! possible permutations of the set. In the case of a prime of 3 you'll see that there are 6 different possible permuations:
0,1,2
0,2,1
1,0,2
1,2,0
2,0,1
2,1,0
If you do the math on the variables above you'll not that it results in the same information requirements...
1/3! = 1/6 = 1.66..
... vs...
1/3 (bias) * 1/2 (step) => 1/6 = 1.66..
Restrictions are simple, bias must be within 0..P-1 and step must be within 1..P-1 (I have been functionally just been using 0..P-2 and adding 1 on arithmetic in my own work). Other than that, it works with all prime numbers no matter how large and will permutate all possible unique sets of them without the need for memory beyond a couple of integers (each technically requiring slightly less bits than the prime itself).
Note carefully that this generator is not meant to be used to generate sets that are not prime in number. It's entirely possible to do so, but not recommended for security sensitive purposes as it would introduce a timing attack.
That said, if you would like to use this method to generate a set sequence that is not a prime, you have two choices.
First (and the simplest/cheapest), pick the prime number just larger than the set size you're looking for and have your generator simply discard anything that doesn't belong. Once more, danger, this is a very bad idea if this is a security sensitive application.
Second (by far the most complicated and costly), you can recognize that all numbers are composed of prime numbers and create multiple generators that then produce a product for each element in the set. In other words, an n of 6 would involve all possible prime generators that could match 6 (in this case, 2 and 3), multiplied in sequence. This is both expensive (although mathematically more elegant) as well as also introducing a timing attack so it's even less recommended.
Lastly, if you need a generator for bias and or step... why don't you use another of the same family :). Suddenly you're extremely close to creating true simple-random-samples (which is not easy usually).
The fundamental weakness of LCGs (x=(x*m+c)%b style generators) is useful here.
If the generator is properly formed then x%f is also a repeating sequence of all values lower than f (provided f if a factor of b).
Since bis usually a power of 2 this means that you can take a 32-bit generator and reduce it to an n-bit generator by masking off the top bits and it will have the same full-range property.
This means that you can reduce the number of discard values to be fewer than N by choosing an appropriate mask.
Unfortunately LCG Is a poor generator for exactly the same reason as given above.
Also, this has exactly the same weakness as I noted in a comment on #JerryCoffin's answer. It will always produce the same sequence and the only thing the seed controls is where to start in that sequence.
Here's some SageMath code that should generate a random permutation the way Daniel Fischer suggested:
def random_safe_prime(lbound):
while True:
q = random_prime(lbound, lbound=lbound // 2)
p = 2 * q + 1
if is_prime(p):
return p, q
def random_permutation(n):
p, q = random_safe_prime(n + 2)
while True:
r = randint(2, p - 1)
if pow(r, 2, p) != 1 and pow(r, q, p) != 1:
i = 1
while True:
x = pow(r, i, p)
if x == 1:
return
if 0 <= x - 2 < n:
yield x - 2
i += 1

Most efficient algorithm to compute a common numerator of a sum of fractions

I'm pretty sure that this is the right site for this question, but feel free to move it to some other stackexchange site if it fits there better.
Suppose you have a sum of fractions a1/d1 + a2/d2 + … + an/dn. You want to compute a common numerator and denominator, i.e., rewrite it as p/q. We have the formula
p = a1*d2*…*dn + d1*a2*d3*…*dn + … + d1*d2*…d(n-1)*an
q = d1*d2*…*dn.
What is the most efficient way to compute these things, in particular, p? You can see that if you compute it naïvely, i.e., using the formula I gave above, you compute a lot of redundant things. For example, you will compute d1*d2 n-1 times.
My first thought was to iteratively compute d1*d2, d1*d2*d3, … and dn*d(n-1), dn*d(n-1)*d(n-2), … but even this is inefficient, because you will end up computing multiplications in the "middle" twice (e.g., if n is large enough, you will compute d3*d4 twice).
I'm sure this problem could be expressed somehow using maybe some graph theory or combinatorics, but I haven't studied enough of that stuff to have a good feel for it.
And one note: I don't care about cancelation, just the most efficient way to multiply things.
UPDATE:
I should have known that people on stackoverflow would be assuming that these were numbers, but I've been so used to my use case that I forgot to mention this.
We cannot just "divide" out an from each term. The use case here is a symbolic system. Actually, I am trying to fix a function called .as_numer_denom() in the SymPy computer algebra system which presently computes this the naïve way. See the corresponding SymPy issue.
Dividing out things has some problems, which I would like to avoid. First, there is no guarantee that things will cancel. This is because mathematically, (a*b)**n != a**n*b**n in general (if a and b are positive it holds, but e.g., if a == b ==-1 and n == 1/2, you get (a*b)**n == 1**(1/2) == 1 but (-1)**(1/2)*(-1)**(1/2) == I*I == -1). So I don't think it's a good idea to assume that dividing by an will cancel it in the expression (this may be actually be unfounded, I'd need to check what the code does).
Second, I'd like to also apply a this algorithm to computing the sum of rational functions. In this case, the terms would automatically be multiplied together into a single polynomial, and "dividing" out each an would involve applying the polynomial division algorithm. You can see in this case, you really do want to compute the most efficient multiplication in the first place.
UPDATE 2:
I think my fears for cancelation of symbolic terms may be unfounded. SymPy does not cancel things like x**n*x**(m - n) automatically, but I think that any exponents that would combine through multiplication would also combine through division, so powers should be canceling.
There is an issue with constants automatically distributing across additions, like:
In [13]: 2*(x + y)*z*(S(1)/2)
Out[13]:
z⋅(2⋅x + 2⋅y)
─────────────
2
But this is first a bug and second could never be a problem (I think) because 1/2 would be split into 1 and 2 by the algorithm that gets the numerator and denominator of each term.
Nonetheless, I still want to know how to do this without "dividing out" di from each term, so that I can have an efficient algorithm for summing rational functions.
Instead of adding up n quotients in one go I would use pairwise addition of quotients.
If things cancel out in partial sums then the numbers or polynomials stay smaller, which makes computation faster.
You avoid the problem of computing the same product multiple times.
You could try to order the additions in a certain way, to make canceling more likely (maybe add quotients with small denominators first?), but I don't know if this would be worthwhile.
If you start from scratch this is simpler to implement, though I'm not sure it fits as a replacement of the problematic routine in SymPy.
Edit: To make it more explicit, I propose to compute a1/d1 + a2/d2 + … + an/dn as (…(a1/d1 + a2/d2) + … ) + an/dn.
Compute two new arrays:
The first contains partial multiples to the left: l[0] = 1, l[i] = l[i-1] * d[i]
The second contains partial multiples to the right: r[n-1] = 1, r[i] = d[i] * r[i+1]
In both cases, 1 is the multiplicative identity of whatever ring you are working in.
Then each of your terms on the top, t[i] = l[i-1] * a[i] * r[i+1]
This assumes multiplication is associative, but it need not be commutative.
As a first optimization, you don't actually have to create r as an array: you can do a first pass to calculate all the l values, and accumulate the r values during a second (backward) pass to calculate the summands. No need to actually store the r values since you use each one once, in order.
In your question you say that this computes d3*d4 twice, but it doesn't. It does multiply two different values by d4 (one a right-multiplication and the other a left-multiplication), but that's not exactly a repeated operation. Anyway, the total number of multiplications is about 4*n, vs. 2*n multiplications and n divisions for the other approach that doesn't work in non-commutative multiplication or non-field rings.
If you want to compute p in the above expression, one way to do this would be to multiply together all of the denominators (in O(n), where n is the number of fractions), letting this value be D. Then, iterate across all of the fractions and for each fraction with numerator ai and denominator di, compute ai * D / di. This last term is equal to the product of the numerator of the fraction and all of the denominators other than its own. Each of these terms can be computed in O(1) time (assuming you're using hardware multiplication, otherwise it might take longer), and you can sum them all up in O(n) time.
This gives an O(n)-time algorithm for computing the numerator and denominator of the new fraction.
It was also pointed out to me that you could manually sift out common denominators and combine those trivially without multiplication.

Counting permutation of Strings

I need help with a problem. Given an input string with repetitions, say "aab", how to
count the number of distinct permutations of that string.
One formula that could be used is n!/n1!n2!.....nr!.
However calculating these ni's takes time O(rn) and O(n),if we
use a lookup table.
However I need a solution without use of such tables.Is any recursive or
dynamic programming solution possible for this problem.
Thanks in advance.
no. of distinct permutations will be n!/(c1!*c2*..*cn!)
here n is length of the string
ck denotes the no. of occurence of each distinct character.
For eg: string :aabb n=4 ca=2,cb=2
solution=4!/(2!*2!)=6
If you want to do this for very large strings, consider using the gamma function (with gamma(n+1)=n!), which is faster for large n and still gives you floating-point accuracy even in cases where you would get an int overflow.
If you have arbitrary precision arithmetic, you could probably push the effort down to O(r+n) by exploiting the fact that you can, e.g. write 1*2*3 * 1*2*3*4 * 1*2*3*4*5*6*7 as (1*2*3)^3 * 4^2 * 6*7. The end result will still have O(rn) digits and you'll still have an O(rn) time consumption, because multiplication cost increases with the size of the number.
I don't see the difference between lookup tables and dynamic programming - basically, dynamic programming uses a lookup table that you build on-the-fly. (i.e., use a lookup table, but only populate it on-demand).
Do you need approximate answers, or exact ones? Which part of this calculation do you think is slow?
If you need approximate answers, use the gamma function as #Yannick Versley suggested.
If you need exact answers, here is how I'd do it. I'd first figure out the prime factorization of the answer, then multiply those factors out. This avoids division. The hard part of figuring out the prime factorization is figuring out the prime factorization of n!. For that you can use a trick. Suppose that p is a prime, and k is the integer part of n/p'. Then the number of times thatpdividesn!iskplus the number of times thatpdividesk. Proceed recursively and it is quick to see that, for instance, the number of times that3is a factor of80!is26 + 8 + 2 = 36`. So after you find the primes up to 'n', it isn't hard to find the prime factorization of 'n!'.
Once you know the prime factorization, you can multiply it out. You expect to be dealing with large numbers, so try to arrange to do lots of small multiplications first, and only a few big ones. Here is a simple way to do that.
Make an array of the prime factors. Scramble it (to mix up big and small factors). Then as long as you have at least 2 factors in your array grab the first two, multiply them, push them onto the end. When you have one number left, that is your answer.
This should be much, much faster for large strings than the naive approach of multiplying the numbers one at a time. However in the end you will have very large numbers, and nothing can make multiplying those fast.
You can keep a running counts for each character, and build the result up as you go along. It's impossible to do better than O(n), since without looking at every character in the string you can't know how many of each character there are.
I've written some code in Python, with some simple unit tests. The code carefully avoids large intermediate values when the result is going to be small (in fact, the variable result is never larger than len(s) times the final result). If you were going to code this up in another language, say C, then you might use an array of size 256 rather than the defaultdict.
If you want an exact result, then I don't think you can do better than this.
from collections import defaultdict
def permutations(s):
seen = defaultdict(int)
for c in s:
seen[c] += 1
result = 1
n = 0
for k, count in seen.iteritems():
for j in xrange(count):
n += 1
result *= n
result //= j + 1
return result
test_cases = [
('abc', 6),
('aab', 3),
('abcd', 24),
('aabb', 6),
('aaaaa', 1),
('a', 1)]
for s, want in test_cases:
got = permutations(s)
if got != want:
print 'permutations(%s) = %s want %s' % (s, got, want)
As #MRalwasser says, the number of permutations should be n!. You can generate those permutations fairly simply, but the run time is going to be exponential because you have to hit exponentially many output strings. (Quick way to show O(n!) = O(2n) is by using Stirling's Formula.)

Prime factor of 300 000 000 000?

I need to find out the prime factors of over 300 billion. I have a function that is adding to the list of them...very slowly! It has been running for about an hour now and i think its got a fair distance to go still. Am i doing it completly wrong or is this expected?
Edit: Im trying to find the largest prime factor of the number 600851475143.
Edit:
Result:
{
List<Int64> ListOfPrimeFactors = new List<Int64>();
Int64 Number = 600851475143;
Int64 DividingNumber = 2;
while (DividingNumber < Number / DividingNumber)
{
if (Number % DividingNumber == 0)
{
ListOfPrimeFactors.Add(DividingNumber);
Number = Number/DividingNumber;
}
else
DividingNumber++;
}
ListOfPrimeFactors.Add(Number);
listBox1.DataSource = ListOfPrimeFactors;
}
}
Are you remembering to divide the number that you're factorizing by each factor as you find them?
Say, for example, you find that 2 is a factor. You can add that to your list of factors, but then you divide the number that you're trying to factorise by that value.
Now you're only searching for the factors of 150 billion. Each time around you should start from the factor you just found. So if 2 was a factor, test 2 again. If the next factor you find is 3, there's no point testing from 2 again.
And so on...
Finding prime factors is difficult using brute force, which sounds like the technique you are using.
Here are a few tips to speed it up somewhat:
Start low, not high
Don't bother testing each potential factor to see whether it is prime--just test LIKELY prime numbers (odd numbers that end in 1,3,7 or 9)
Don't bother testing even numbers (all divisible by 2), or odds that end in 5 (all divisible by 5). Of course, don't actually skip 2 and 5!!
When you find a prime factor, make sure to divide it out--don't continue to use your massive original number. See my example below.
If you find a factor, make sure to test it AGAIN to see if it is in there multiple times. Your number could be 2x2x3x7x7x7x31 or something like that.
Stop when you reach >= sqrt(remaining large number)
Edit: A simple example:
You are finding the factors of 275.
Test 275 for divisibility by 2. Does 275/2 = int(275/2)? No. Failed.
Test 275 for divisibility by 3. Failed.
Skip 4!
Test 275 for divisibility by 5. YES! 275/5 = 55. So your NEW test number is now 55.
Test 55 for divisibility by 5. YES! 55/5 = 11. So your NEW test number is now 11.
BUT 5 > sqrt (11), so 11 is prime, and you can stop!
So 275 = 5 * 5 * 11
Make more sense?
Factoring big numbers is a hard problem. So hard, in fact, that we rely on it to keep RSA secure. But take a look at the wikipedia page for some pointers to algorithms that can help. But for a number that small, it really shouldn't be taking that long, unless you are re-doing work over and over again that you don't have to somewhere.
For the brute-force solution, remember that you can do some mini-optimizations:
Check 2 specially, then only check odd numbers.
You only ever need to check up to the square-root of the number (if you find no factors by then, then the number is prime).
Once you find a factor, don't use the original number to find the next factor, divide it by the found factor, and search the new smaller number.
When you find a factor, divide it through as many times as you can. After that, you never need to check that number, or any smaller numbers again.
If you do all the above, each new factor you find will be prime, since any smaller factors have already been removed.
Here is an XSLT solution!
This XSLT transformation takes 0.109 sec.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="xs saxon f"
>
<xsl:import href="../f/func-Primes.xsl"/>
<xsl:output method="text"/>
<xsl:template name="initial" match="/*">
<xsl:sequence select="f:maxPrimeFactor(600851475143)"/>
</xsl:template>
<xsl:function name="f:maxPrimeFactor" as="xs:integer">
<xsl:param name="pNum" as="xs:integer"/>
<xsl:sequence select=
"if(f:isPrime($pNum))
then $pNum
else
for $vEnd in xs:integer(floor(f:sqrt($pNum, 0.1E0))),
$vDiv1 in (2 to $vEnd)[$pNum mod . = 0][1],
$vDiv2 in $pNum idiv $vDiv1
return
max((f:maxPrimeFactor($vDiv1),f:maxPrimeFactor($vDiv2)))
"/>
</xsl:function>
</xsl:stylesheet>
This transformation produces the correct result (the maximum prime factor of 600851475143) in just 0.109 sec.:
6857
The transformation uses the f:sqrt() and f:isPrime() defined in FXSL 2.0 -- a library for functional programming in XSLT. FXSL is itself written entirely in XSLT.
f:isPrime() uses Fermat's little theorem so that it is efficient to determine primeality.
One last thing nobody has mentioned, perhaps because it seems obvious. Every time you find a factor and divide it out, keep trying the factor until it fails.
64 only has one prime factor, 2. You will find that out pretty trivially if you keep dividing out the 2 until you can't anymore.
$ time factor 300000000000 > /dev/null
real 0m0.027s
user 0m0.000s
sys 0m0.001s
You're doing something wrong if it's taking an hour. You might even have an infinite loop somewhere - make sure you're not using 32-bit ints.
The key to understanding why the square root is important, consider that each factor of n below the square root of n has a corresponding factor above it. To see this, consider that if x is factor of n, then x/n = m which means that x/m = n, hence m is also a factor.
I wouldn't expect it to take very long at all - that's not a particularly large number.
Could you give us an example number which is causing your code difficulties?
Here's one site where you can get answers: Factoris - Online factorization service. It can do really big numbers, but it also can factorize algebraic expressions.
The fastest algorithms are sieve algorithms, and are based on arcane areas of discrete mathematics (over my head at least), complicated to implement and test.
The simplest algorithm for factoring is probably (as others have said) the Sieve of Eratosthenes. Things to remember about using this to factor a number N:
general idea: you're checking an increasing sequence of possible integer factors x to see if they evenly divide your candidate number N (in C/Java/Javascript check whether N % x == 0) in which case N is not prime.
you just need to go up to sqrt(N), but don't actually calculate sqrt(N): loop as long as your test factor x passes the test x*x<N
if you have the memory to save a bunch of previous primes, use only those as the test factors (and don't save them if prime P fails the test P*P > N_max since you'll never use them again
Even if you don't save the previous primes, for possible factors x just check 2 and all the odd numbers. Yes, it will take longer, but not that much longer for reasonable sized numbers. The prime-counting function and its approximations can tell you what fraction of numbers are prime; this fraction decreases very slowly. Even for 264 = approx 1.8x1019, roughly one out of every 43 numbers is prime (= one out of every 21.5 odd numbers is prime). For factors of numbers less than 264, those factors x are less than 232 where about one out of every 20 numbers is prime = one out of every 10 odd numbers is prime. So you'll have to test 10 times as many numbers, but the loop should be a bit faster and you don't have to mess around with storing all those primes.
There are also some older and simpler sieve algorithms that a little bit more complex but still fairly understandable. See Dixon's, Shanks' and Fermat's factoring algorithms. I read an article about one of these once, can't remember which one, but they're all fairly straightforward and use algebraic properties of the differences of squares.
If you're just testing whether a number N is prime, and you don't actually care about the factors themselves, use a probabilistic primality test. Miller-Rabin is the most standard one, I think.
I spent some time on this since it just sucked me in. I won't paste the code here just yet. Instead see this factors.py gist if you're curious.
Mind you, I didn't know anything about factoring (still don't) before reading this question. It's just a Python implementation of BradC's answer above.
On my MacBook it takes 0.002 secs to factor the number mentioned in the question (600851475143).
There must obviously be much, much faster ways of doing this. My program takes 19 secs to compute the factors of 6008514751431331. But the Factoris service just spits out the answer in no-time.
The specific number is 300425737571? It trivially factors into 131 * 151 * 673 * 22567.
I don't see what all the fuss is...
Here's some Haskell goodness for you guys :)
primeFactors n = factor n primes
where factor n (p:ps) | p*p > n = [n]
| n `mod` p /= 0 = factor n ps
| otherwise = p : factor (n `div` p) (p:ps)
primes = 2 : filter ((==1) . length . primeFactors) [3,5..]
Took it about .5 seconds to find them, so I'd call that a success.
You could use the sieve of Eratosthenes to find the primes and see if your number is divisible by those you find.
You only need to check it's remainder mod(n) where n is a prime <= sqrt(N) where N is the number you are trying to factor. It really shouldn't take over an hour, even on a really slow computer or a TI-85.
Your algorithm must be FUBAR. This only takes about 0.1s on my 1.6 GHz netbook in Python. Python isn't known for its blazing speed. It does, however, have arbitrary precision integers...
import math
import operator
def factor(n):
"""Given the number n, to factor yield a it's prime factors.
factor(1) yields one result: 1. Negative n is not supported."""
M = math.sqrt(n) # no factors larger than M
p = 2 # candidate factor to test
while p <= M: # keep looking until pointless
d, m = divmod(n, p)
if m == 0:
yield p # p is a prime factor
n = d # divide n accordingly
M = math.sqrt(n) # and adjust M
else:
p += 1 # p didn't pan out, try the next candidate
yield n # whatever's left in n is a prime factor
def test_factor(n):
f = factor(n)
n2 = reduce(operator.mul, f)
assert n2 == n
def example():
n = 600851475143
f = list(factor(n))
assert reduce(operator.mul, f) == n
print n, "=", "*".join(str(p) for p in f)
example()
# output:
# 600851475143 = 71*839*1471*6857
(This code seems to work in defiance of the fact that I don't know enough about number theory to fill a thimble.)
Just to expand/improve slightly on the "only test odd numbers that don't end in 5" suggestions...
All primes greater than 3 are either one more or one less than a multiple of 6 (6x + 1 or 6x - 1 for integer values of x).
It shouldn't take that long, even with a relatively naive brute force. For that specific number, I can factor it in my head in about one second.
You say you don't want solutions(?), but here's your "subtle" hint. The only prime factors of the number are the lowest three primes.
Semi-prime numbers of that size are used for encryption, so I am curious as to what you exactly want to use them for.
That aside, there currently are not good ways to find the prime factorization of large numbers in a relatively small amount of time.

Resources