Tips for Project Euler Problem #78 - algorithm

This is the problem in question: Problem #78
This is driving me crazy. I've been working on this for a few hours now and I've been able to reduce the complexity of finding the number of ways to stack n coins to O(n/2), but even with those improvements and starting from an n for which p(n) is close to one-million, I still can't reach the answer in under a minute. Not at all, actually.
Are there any hints that could help me with this?
Keep in mind that I don't want a full solution and there shouldn't be any functional solutions posted here, so as not to spoil the problem for other people. This is why I haven't included any code either.

Wikipedia can help you here. I assume that the solution you already have is a recursion such as the one in the section "intermediate function". This can be used to find the solution to the Euler problem, but isn't fast.
A much better way is to use the recursion based on the pentagonal number theorem in the next section. The proof of this theorem isn't straight forward, so I don't think the authors of the problem expect that you come up with the theorem by yourself. Rather it is one of the problems, where they expect some literature search.

This problem is really asking to find the first term in the sequence of integer partitions that’s divisible by 1,000,000.
A partition of an integer, n, is one way of describing how many ways the sum of positive integers, ≤ n, can be added together to equal n, regardless of order. The function p(n) is used to denote the number of partitions for n. Below we show our 5 “coins” as addends to evaluate 7 partitions, that is p(5)=7.
5 = 5
= 4+1
= 3+2
= 3+1+1
= 2+2+1
= 2+1+1+1
= 1+1+1+1+1
We use a generating function to create the series until we find the required n.
The generating function requires at most 500 so-called generalized pentagonal numbers, given by n(3n – 1)/2 with 0, ± 1, ± 2, ± 3…, the first few of which are 0, 1, 2, 5, 7, 12, 15, 22, 26, 35, … (Sloane’s A001318).
We have the following generating function which uses our pentagonal numbers as exponents:
1 - q - q^2 + q^5 + q^7 - q^12 - q^15 + q^22 + q^26 + ...
my blog at blog.dreamshire.com has a perl program that solves this in under 10 sec.

Have you done problems 31 or 76 yet? They form a nice set that is an generalization of the same base problem each time. Doing the easier questions may give you insight into a solution for 78.

here some hints:
Divisibility by one million is not the same thing as just being larger than one million. 1 million = 1,000,000 = 10^6 = 2^6 * 5^6.
So the question is to find a lowest n so that the factors of p(n) contain six 2's and six 5's.

Related

Difficulty in thinking a divide and conquer approach

I am self-learning algorithms. As we know Divide and Conquer is one of the algorithm design paradigms. I have studied mergeSort, QuickSort, Karatsuba Multiplication, counting inversions of an array as examples of this particular design pattern. Although it sounds very simple, divides the problems into subproblems, solves each subproblem recursively, and merges the result of each of them, I found it very difficult to develop an idea of how to apply that logic to a new problem. To my understanding, all those above-mentioned canonical examples come up with a very clever trick to solve the problem. For example, I am trying to solve the following problem:
Given a sequence of n numbers such that the difference between two consecutive numbers is constant, find the missing term in logarithmic time.
Example: [5, 7, 9, 11, 15]
Answer: 13
First, I came up with the idea that it can be solved using the divide and conquer approach as the naive approach will take O(n) time. From my understanding of divide and conquer, this is how I approached:
The original problem can be divided into two independent subproblems. I can search for the missing term in the two subproblems recursively. So, I first divide the problem.
leftArray = [5,7,9]
rightArray = [11, 15]
Now it says, I need to solve the subproblems recursively until it becomes trivial to solve. In this case, the subproblem becomes of size 1. If there is only one element, there are 0 missing elements. Now to combine the result. But I am not sure how to do it or how it will solve my original problem.
Definitely, I am missing something crucial here. My question is how to approach when solving this type of divide and conquer problem. Should I come up with a trick like a mergeSort or QuickSort? The more I see the solution to this kind of problem, it feels I am memorizing the approach to solve, not understanding and each problem solves it differently. Any help or suggestion regarding the mindset when solving divide and conquer would be greatly appreciated. I have been trying for a long time to develop my algorithmic skill but I improved very little. Thanks in advance.
You have the right approach. The only missing part is an O(1) way to decide which side you are discarding.
First, note that the numbers in your problem must be ordered, otherwise you can't do better than O(n). There also needs to be at least three numbers, otherwise you wouldn't figure out the "step".
With this understanding in place, you can determine the "step" in O(1) time by examining the initial three terms, and see what's the difference between the consecutive ones. Two outcomes are possible:
Both differences are the same, and
One difference is twice as big as the other.
Case 2 hands you a solution by luck, so we will consider only the first case from now on. With the step in hand, you can determine if the range has a gap in it by subtracting the endpoints, and comparing the result to the number of gaps times the step. If you arrive at the same result, the range does not have a missing term, and can be discarded. When both halves can be discarded, the gap is between them.
As #Sergey Kalinichenko points out, this assumes the incoming set is ordered
However, if you're certain the input is ordered (which is likely in this case) observe the nth position's value to be start + jumpsize * index; this allows you to bisect to find where it shifts
Example: [5, 7, 9, 11, 15]
Answer: 13
start = 5
jumpsize = 2
check midpoint: 5 * 2 * 2 -> 9
this is valid, so the shift must be after the midpoint
recurse
You can find the jumpsize by checking the first 3 values
a, b, c = (language-dependent retrieval)
gap1 = b - a
gap2 = c - b
if gap1 != gap2:
if (value at 4th index) - c == gap1:
missing value is b + gap1 # 2nd gap doesn't match
else:
missing value is a + gap2 # 1st gap doesn't match
bisect remaining values

Number of positive integers in [1,1e18] that cannot be divided by any integers in [2,10]

I am having difficulty trying to solve the following problem:
For Q queries, Q <= 1e6, where each query is a positive integer N, N <= 1e18, find the number of integers in [1,N] that cannot be
divided by integers in [2,10] for each query.
I thought of using using a sieve method to filter out numbers in [1,1e18] for each query (similar to sieve of eratosthenes). However, the value of N could be very large. Hence, there is no way I could use this method. The most useful observation that I could make is that numbers ending with 0,2,4,5,6,8 are invalid. But that does not help me with this problem.
I saw a solution for a similar problem that uses a smaller number of queries (Q <= 200). But it doesn't work for this problem (and I don't understand that solution).
Could someone please advise me on how to solve this problem?
The only matter numbers in [2,10] are those primes which are 2, 3, 5, 7
So, Let say, the number cannot be divided by integers in [2,10] is the number cannot be divided by {2,3,5,7}
Which is also equalled to the total number between [1,n] minus all number that is divided by any combination of {2,3,5,7}.
So, this is the fun part: from [1,n] how many numbers that is divided by 2?
The answer is n/2 (why? simple, because every 2 number, there is one number divided by 2)
Similarly, how many numbers that is divided by 5? The answer is n/5
...
So, do we have our answer yet? No, as we found out that we have doubled count those numbers that divided by both {2, 5} or {2, 7} ..., so now, we need to minus them.
But wait, seems like we are double minus those that divided by {2,5,7} ... so we need to add it back
...
Keep doing this until all combinations are taken care of,
so there should be 2^4 combination, which is 16 in total, pretty small to deal with.
Take a look at Inclusion-Exclusion principle for some good understanding.
Good luck!
Here is an approach on how to handle this.
The place to start is to think about how you can split this into pieces. With such a problem, a place to start is the least common denominator (LCD) -- in this case 2,520 (the smallest number divisible by all the numbers less than 10).
The idea is that if x is not divisible by any number from 2-10, then x + 2,520 is also not divisible.
Hence, you can divide the problem into two pieces:
How many numbers between 1 and 2,520 are "relatively prime" to the numbers from 2-10?
How many times does 2,520 go into your target number? You need to take the remainder into account as well.

Sum of numbers making a sequence

While watching the rugby last night I was wondering if any scores were impossible given you can only score points in lots of 3, 5 or 7. It didn't take long to work out that any number greater than 4 is attainable. 5=5, 6=3+3, 7=7, 8=3+5, 9=3+3+3, 10=5+5 and so on.
Extending on that idea for 5, 7 and 9 yields the following possible scores:
5,7,9,10,12,14 // and now all numbers are possible.
For 7, 9 and 11:
7,9,11,14,16,18,20,22,23,25,27 // all possible from here
I did these in my head, can anyone suggest a good algorithm that would determine the lowest possible score above which all scores are attainable given a set of scores.
I modelled it like this:
forall a < 10:
forall b < 10:
forall c < 10:
list.add(3a + 5b + 7c);
list.sort_smallest_first();
Then check the list for a sequence longer than 3 (the smallest score possible). Seems pretty impractical and slow for anything beyond the trivial case.
There is only one unattainable number above which all scores are attainable.
This is called the frobenius number. See: http://en.wikipedia.org/wiki/Frobenius_number
The wiki page should have links for algorithms to solve it, for instance: http://www.combinatorics.org/Volume_12/PDF/v12i1r27.pdf
For 2 numbers a,b an exact formula (ab-a-b) is known (which you could use to cut down your search space), and for 3 numbers a,b,c a sharp lower bound (sqrt(3abc)-a-b-c) and quite fast algorithms are known.
If the numbers are in arithmetic progression, then an exact formula is known (see wiki). I mention this because in your examples all numbers are in arithmetic progression.
So to answer your question, find the Frobenius number and add 1 to it.
Hope that helps.

Solving N-Queens Problem... How far can we go?

The N-Queens Problem:
This problem states that given a chess board of size N by N, find the different permutations in which N queens can be placed on the board without any one threatening each other.
My question is:
What is the maximum value of N for which a program can calculate the answer in reasonable amount of time? Or what is the largest N we have seen so far?
Here is my program in CLPFD(Prolog):
generate([],_).
generate([H|T],N) :-
H in 1..N ,
generate(T,N).
lenlist(L,N) :-
lenlist(L,0,N).
lenlist([],N,N).
lenlist([_|T],P,N) :-
P1 is P+1,
lenlist(T,P1,N).
queens(N,L) :-
generate(L,N),lenlist(L,N),
safe(L),
!,
labeling([ffc],L).
notattack(X,Xs) :-
notattack(X,Xs,1).
notattack(X,[],N).
notattack(X,[Y|Ys],N) :-
X #\= Y,
X #\= Y - N,
X #\= Y + N,
N1 is N + 1,
notattack(X,Ys,N1).
safe([]).
safe([F|T]) :-
notattack(F,T),
safe(T).
This program works just fine, but the the time it takes keeps on increasing with N.
Here is a sample execution:
?- queens(4,L).
L = [2, 4, 1, 3] ;
L = [3, 1, 4, 2] ;
No
This means you place the 4 queens at Row 2 in Column1, Row 4 in Column 2, Row 1 in 3 and Row 3 in 4.(In a 4 By 4 chess board)
Now lets see how this program performs(Time taken in calculating the first permutation):
For N=4,5.....10 Computes within a second
For N=11-30 Takes between -1-3 seconds
For N=40..50 Still calculates within a minute
At N=60 It goes out of Global stack(Search space being enormous).
This was a past Homework problem. (The original problem was just to code N-Queens)
I am also interested in seeing alternate implementations in other languages(which performs better than my implementation) or If there is room for improvement in my algorithm/program
This discussion conflates three different computational problems: (1) Finding a solution to the N queens problem, (2) Listing all solutions for some fixed N, and (3) counting all of the solutions for some fixed N. The first problem looks tricky at first for a size of board such as N=8. However, as Wikipedia suggests, in some key ways it is easy when N is large. The queens on a large board don't communicate all that much. Except for memory constraints, a heuristic repair algorithm has an easier and easier job as N increases.
Listing every solution is a different matter. That can probably be done with a good dynamic programming code up to a size that is large enough that there is no point in reading the output.
The most interesting version of the question is to count the solutions. The state of the art is summarized in a fabulous reference known as The Encyclopedia of Integer Sequences. It has been computed up to N=26. I would guess that that also uses dynamic programming, but unlike the case of listing every solution, the algorithmic problem is much deeper and open to further advances.
Loren Pechtel said: "Now for some real insanity: 29 took 9 seconds.
30 took almost 6 minutes!"
This fascinating lack of predictability in backtrack-complexity for different board sizes was the part of this puzzle that most interested me. For years I've been building a list of the 'counts' of algorithm steps needed to find the first solution for each board size - using the simple and well known depth-first algorithm, in a recursive C++ function.
Here's a list of all those 'counts' for boards up to N=49 ... minus N=46 and N=48 which are still work-in-progress:
http://queens.cspea.co.uk/csp-q-allplaced.html
(I've got that listed in the Encyclopedia of Integer Sequences (OEIS) as A140450)
That page includes a link to a list of the matching first solutions.
(My list of First Solutions is OEIS Sequence number A141843)
I don't primarily record how much processing time each solution demands, but rather I record how many failed queen-placements were needed prior to discovery of each board's algorithmically-first solution. Of course the rate of queen placements depends on CPU performance, but given a quick test-run on a particular CPU and a particular board size, it's an easy matter to calculate how long it took to solve one of these 'found' solutions.
For example, on an Intel Pentium D 3.4GHz CPU, using a single CPU thread -
For N=35 my program 'placed' 24 million queens per second and took just 6 minutes to find the first solution.
For N=47 my program 'placed' 20.5 million queens per second and took 199 days.
My current 2.8GHz i7-860 is thrashing through about 28.6 million queens per second, trying to find the first solution for N=48. So far it has taken over 550 days (theoretically, if it had never been uninterrupted) to UNsuccessfully place 1,369,331,731,000,000 (and rapidly climbing) queens.
My web site doesn't (yet) show any C++ code, but I do give a link on that web page to my simple illustration of each one of the 15 algorithm steps needed to solve the N=5 board.
It's a delicious puzzle indeed!
Which Prolog system are you using? For example, with recent versions of SWI-Prolog, you can readily find solutions for N=80 and N=100 within fractions of a second, using your original code. Many other Prolog systems will be much faster than that.
The N-queens problem is even featured in one of the online examples of SWI-Prolog, available as CLP(FD) queens in SWISH.
Example with 100 queens:
?- time((n_queens(100, Qs), labeling([ff], Qs))).
Qs = [1, 3, 5, 57, 59 | ...] .
2,984,158 inferences, 0.299 CPU in 0.299 seconds (100% CPU, 9964202 Lips)
SWISH also shows you nices image of solutions.
Here is an animated GIF showing the complete solution process for N=40 queens with SWI-Prolog:
a short solution presented by raymond hettinger at pycon: easy ai in python
#!/usr/bin/env python
from itertools import permutations
n = 12
cols = range(n)
for vec in permutations(cols):
if (n == len(set(vec[i] + i for i in cols))
== len(set(vec[i] - i for i in cols))):
print vec
computing all permutations is not scalable, though (O(n!))
As to what is the largest N solved by computers there are references in literature in which a solution for N around 3*10^6 has been found using a conflict repair algorithm (i.e. local search). See for example the classical paper of [Sosic and Gu].
As to exact solving with backtracking,there exist some clever branching heuristics which achieve correct configurations with almost no backtracking. These heuristics can also be used to find the first-k solutions to the problem: after finding an initial correct configuration the search backtracks to find other valid configurations in the vicinity.
References for these almost perfect heuristics are [Kale 90] and [San Segundo 2011]
What is the maximum value of N for which a program can calculate the answer in reasonable amount of time? Or what is the largest N we have seen so far?
There is no limit. That is, checking for the validity of a solution is more costly than constructing one solution plus seven symmetrical ones.
See Wikipedia:
"Explicit solutions exist for placing n queens on an n × n board for all n ≥ 4, requiring no combinatorial search whatsoever‌​.".
I dragged out an old Delphi program that counted the number of solutions for any given board size, did a quick modification to make it stop after one hit and I'm seeing an odd pattern in the data:
The first board that took over 1 second to solve was n = 20. 21 solved in 62 milliseconds, though. (Note: This is based off Now, not any high precision system.) 22 took 10 seconds, not to be repeated until 28.
I don't know how good the optimization is as this was originally a highly optimized routine from back when the rules of optimization were very different. I did do one thing very different than most implementations, though--it has no board. Rather, I'm tracking which columns and diagonals are attacked and adding one queen per row. This means 3 array lookups per cell tested and no multiplication at all. (As I said, from when the rules were very different.)
Now for some real insanity: 29 took 9 seconds. 30 took almost 6 minutes!
Actually constrained random walk (generate and test) like what bakore outlined is the way to go if you just need a handful of solutions because these can be generated rapidly. I did this for a class when I was 20 or 21 and published the solution in the Journal of Pascal, Ada & Modula-2, March 1987, "The Queens Problem Revisited". I just dusted off the code from that article today (and this is very inefficient code) and after fixing a couple of problems have been generating N=26 ... N=60 solutions.
If you only want 1 solution then it can be found greedily in linear time O(N). My code in python:
import numpy as np
n = int(raw_input("Enter n: "))
rs = np.zeros(n,dtype=np.int64)
board=np.zeros((n,n),dtype=np.int64)
k=0
if n%6==2 :
for i in range(2,n+1,2) :
#print i,
rs[k]=i-1
k+=1
rs[k]=3-1
k+=1
rs[k]=1-1
k+=1
for i in range(7,n+1,2) :
rs[k]=i-1
k+=1
rs[k]=5-1
elif n%6==3 :
rs[k]=4-1
k+=1
for i in range(6,n+1,2) :
rs[k]=i-1
k+=1
rs[k]=2-1
k+=1
for i in range(5,n+1,2) :
rs[k]=i-1
k+=1
rs[k]=1-1
k+=1
rs[k]=3-1
else :
for i in range(2,n+1,2) :
rs[k]=i-1
k+=1
for i in range(1,n+1,2) :
rs[k]=i-1
k+=1
for i in range(n) :
board[rs[i]][i]=1
print "\n"
for i in range(n) :
for j in range(n) :
print board[i][j],
print
Here, however printing takes O(N^2) time and also python being a slower language any one can try implementing it in other languages like C/C++ or Java. But even with python it will get the first solution for n=1000 within 1 or 2 seconds.

Prime factor of 300 000 000 000?

I need to find out the prime factors of over 300 billion. I have a function that is adding to the list of them...very slowly! It has been running for about an hour now and i think its got a fair distance to go still. Am i doing it completly wrong or is this expected?
Edit: Im trying to find the largest prime factor of the number 600851475143.
Edit:
Result:
{
List<Int64> ListOfPrimeFactors = new List<Int64>();
Int64 Number = 600851475143;
Int64 DividingNumber = 2;
while (DividingNumber < Number / DividingNumber)
{
if (Number % DividingNumber == 0)
{
ListOfPrimeFactors.Add(DividingNumber);
Number = Number/DividingNumber;
}
else
DividingNumber++;
}
ListOfPrimeFactors.Add(Number);
listBox1.DataSource = ListOfPrimeFactors;
}
}
Are you remembering to divide the number that you're factorizing by each factor as you find them?
Say, for example, you find that 2 is a factor. You can add that to your list of factors, but then you divide the number that you're trying to factorise by that value.
Now you're only searching for the factors of 150 billion. Each time around you should start from the factor you just found. So if 2 was a factor, test 2 again. If the next factor you find is 3, there's no point testing from 2 again.
And so on...
Finding prime factors is difficult using brute force, which sounds like the technique you are using.
Here are a few tips to speed it up somewhat:
Start low, not high
Don't bother testing each potential factor to see whether it is prime--just test LIKELY prime numbers (odd numbers that end in 1,3,7 or 9)
Don't bother testing even numbers (all divisible by 2), or odds that end in 5 (all divisible by 5). Of course, don't actually skip 2 and 5!!
When you find a prime factor, make sure to divide it out--don't continue to use your massive original number. See my example below.
If you find a factor, make sure to test it AGAIN to see if it is in there multiple times. Your number could be 2x2x3x7x7x7x31 or something like that.
Stop when you reach >= sqrt(remaining large number)
Edit: A simple example:
You are finding the factors of 275.
Test 275 for divisibility by 2. Does 275/2 = int(275/2)? No. Failed.
Test 275 for divisibility by 3. Failed.
Skip 4!
Test 275 for divisibility by 5. YES! 275/5 = 55. So your NEW test number is now 55.
Test 55 for divisibility by 5. YES! 55/5 = 11. So your NEW test number is now 11.
BUT 5 > sqrt (11), so 11 is prime, and you can stop!
So 275 = 5 * 5 * 11
Make more sense?
Factoring big numbers is a hard problem. So hard, in fact, that we rely on it to keep RSA secure. But take a look at the wikipedia page for some pointers to algorithms that can help. But for a number that small, it really shouldn't be taking that long, unless you are re-doing work over and over again that you don't have to somewhere.
For the brute-force solution, remember that you can do some mini-optimizations:
Check 2 specially, then only check odd numbers.
You only ever need to check up to the square-root of the number (if you find no factors by then, then the number is prime).
Once you find a factor, don't use the original number to find the next factor, divide it by the found factor, and search the new smaller number.
When you find a factor, divide it through as many times as you can. After that, you never need to check that number, or any smaller numbers again.
If you do all the above, each new factor you find will be prime, since any smaller factors have already been removed.
Here is an XSLT solution!
This XSLT transformation takes 0.109 sec.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="xs saxon f"
>
<xsl:import href="../f/func-Primes.xsl"/>
<xsl:output method="text"/>
<xsl:template name="initial" match="/*">
<xsl:sequence select="f:maxPrimeFactor(600851475143)"/>
</xsl:template>
<xsl:function name="f:maxPrimeFactor" as="xs:integer">
<xsl:param name="pNum" as="xs:integer"/>
<xsl:sequence select=
"if(f:isPrime($pNum))
then $pNum
else
for $vEnd in xs:integer(floor(f:sqrt($pNum, 0.1E0))),
$vDiv1 in (2 to $vEnd)[$pNum mod . = 0][1],
$vDiv2 in $pNum idiv $vDiv1
return
max((f:maxPrimeFactor($vDiv1),f:maxPrimeFactor($vDiv2)))
"/>
</xsl:function>
</xsl:stylesheet>
This transformation produces the correct result (the maximum prime factor of 600851475143) in just 0.109 sec.:
6857
The transformation uses the f:sqrt() and f:isPrime() defined in FXSL 2.0 -- a library for functional programming in XSLT. FXSL is itself written entirely in XSLT.
f:isPrime() uses Fermat's little theorem so that it is efficient to determine primeality.
One last thing nobody has mentioned, perhaps because it seems obvious. Every time you find a factor and divide it out, keep trying the factor until it fails.
64 only has one prime factor, 2. You will find that out pretty trivially if you keep dividing out the 2 until you can't anymore.
$ time factor 300000000000 > /dev/null
real 0m0.027s
user 0m0.000s
sys 0m0.001s
You're doing something wrong if it's taking an hour. You might even have an infinite loop somewhere - make sure you're not using 32-bit ints.
The key to understanding why the square root is important, consider that each factor of n below the square root of n has a corresponding factor above it. To see this, consider that if x is factor of n, then x/n = m which means that x/m = n, hence m is also a factor.
I wouldn't expect it to take very long at all - that's not a particularly large number.
Could you give us an example number which is causing your code difficulties?
Here's one site where you can get answers: Factoris - Online factorization service. It can do really big numbers, but it also can factorize algebraic expressions.
The fastest algorithms are sieve algorithms, and are based on arcane areas of discrete mathematics (over my head at least), complicated to implement and test.
The simplest algorithm for factoring is probably (as others have said) the Sieve of Eratosthenes. Things to remember about using this to factor a number N:
general idea: you're checking an increasing sequence of possible integer factors x to see if they evenly divide your candidate number N (in C/Java/Javascript check whether N % x == 0) in which case N is not prime.
you just need to go up to sqrt(N), but don't actually calculate sqrt(N): loop as long as your test factor x passes the test x*x<N
if you have the memory to save a bunch of previous primes, use only those as the test factors (and don't save them if prime P fails the test P*P > N_max since you'll never use them again
Even if you don't save the previous primes, for possible factors x just check 2 and all the odd numbers. Yes, it will take longer, but not that much longer for reasonable sized numbers. The prime-counting function and its approximations can tell you what fraction of numbers are prime; this fraction decreases very slowly. Even for 264 = approx 1.8x1019, roughly one out of every 43 numbers is prime (= one out of every 21.5 odd numbers is prime). For factors of numbers less than 264, those factors x are less than 232 where about one out of every 20 numbers is prime = one out of every 10 odd numbers is prime. So you'll have to test 10 times as many numbers, but the loop should be a bit faster and you don't have to mess around with storing all those primes.
There are also some older and simpler sieve algorithms that a little bit more complex but still fairly understandable. See Dixon's, Shanks' and Fermat's factoring algorithms. I read an article about one of these once, can't remember which one, but they're all fairly straightforward and use algebraic properties of the differences of squares.
If you're just testing whether a number N is prime, and you don't actually care about the factors themselves, use a probabilistic primality test. Miller-Rabin is the most standard one, I think.
I spent some time on this since it just sucked me in. I won't paste the code here just yet. Instead see this factors.py gist if you're curious.
Mind you, I didn't know anything about factoring (still don't) before reading this question. It's just a Python implementation of BradC's answer above.
On my MacBook it takes 0.002 secs to factor the number mentioned in the question (600851475143).
There must obviously be much, much faster ways of doing this. My program takes 19 secs to compute the factors of 6008514751431331. But the Factoris service just spits out the answer in no-time.
The specific number is 300425737571? It trivially factors into 131 * 151 * 673 * 22567.
I don't see what all the fuss is...
Here's some Haskell goodness for you guys :)
primeFactors n = factor n primes
where factor n (p:ps) | p*p > n = [n]
| n `mod` p /= 0 = factor n ps
| otherwise = p : factor (n `div` p) (p:ps)
primes = 2 : filter ((==1) . length . primeFactors) [3,5..]
Took it about .5 seconds to find them, so I'd call that a success.
You could use the sieve of Eratosthenes to find the primes and see if your number is divisible by those you find.
You only need to check it's remainder mod(n) where n is a prime <= sqrt(N) where N is the number you are trying to factor. It really shouldn't take over an hour, even on a really slow computer or a TI-85.
Your algorithm must be FUBAR. This only takes about 0.1s on my 1.6 GHz netbook in Python. Python isn't known for its blazing speed. It does, however, have arbitrary precision integers...
import math
import operator
def factor(n):
"""Given the number n, to factor yield a it's prime factors.
factor(1) yields one result: 1. Negative n is not supported."""
M = math.sqrt(n) # no factors larger than M
p = 2 # candidate factor to test
while p <= M: # keep looking until pointless
d, m = divmod(n, p)
if m == 0:
yield p # p is a prime factor
n = d # divide n accordingly
M = math.sqrt(n) # and adjust M
else:
p += 1 # p didn't pan out, try the next candidate
yield n # whatever's left in n is a prime factor
def test_factor(n):
f = factor(n)
n2 = reduce(operator.mul, f)
assert n2 == n
def example():
n = 600851475143
f = list(factor(n))
assert reduce(operator.mul, f) == n
print n, "=", "*".join(str(p) for p in f)
example()
# output:
# 600851475143 = 71*839*1471*6857
(This code seems to work in defiance of the fact that I don't know enough about number theory to fill a thimble.)
Just to expand/improve slightly on the "only test odd numbers that don't end in 5" suggestions...
All primes greater than 3 are either one more or one less than a multiple of 6 (6x + 1 or 6x - 1 for integer values of x).
It shouldn't take that long, even with a relatively naive brute force. For that specific number, I can factor it in my head in about one second.
You say you don't want solutions(?), but here's your "subtle" hint. The only prime factors of the number are the lowest three primes.
Semi-prime numbers of that size are used for encryption, so I am curious as to what you exactly want to use them for.
That aside, there currently are not good ways to find the prime factorization of large numbers in a relatively small amount of time.

Resources