How do I determine whether my calculation of pi is accurate? - algorithm

I was trying various methods to implement a program that gives the digits of pi sequentially. I tried the Taylor series method, but it proved to converge extremely slowly (when I compared my result with the online values after some time). Anyway, I am trying better algorithms.
So, while writing the program I got stuck on a problem, as with all algorithms: How do I know that the n digits that I've calculated are accurate?

Since I'm the current world record holder for the most digits of pi, I'll add my two cents:
Unless you're actually setting a new world record, the common practice is just to verify the computed digits against the known values. So that's simple enough.
In fact, I have a webpage that lists snippets of digits for the purpose of verifying computations against them: http://www.numberworld.org/digits/Pi/
But when you get into world-record territory, there's nothing to compare against.
Historically, the standard approach for verifying that computed digits are correct is to recompute the digits using a second algorithm. So if either computation goes bad, the digits at the end won't match.
This does typically more than double the amount of time needed (since the second algorithm is usually slower). But it's the only way to verify the computed digits once you've wandered into the uncharted territory of never-before-computed digits and a new world record.
Back in the days where supercomputers were setting the records, two different AGM algorithms were commonly used:
Gauss–Legendre algorithm
Borwein's algorithm
These are both O(N log(N)^2) algorithms that were fairly easy to implement.
However, nowadays, things are a bit different. In the last three world records, instead of performing two computations, we performed only one computation using the fastest known formula (Chudnovsky Formula):
This algorithm is much harder to implement, but it is a lot faster than the AGM algorithms.
Then we verify the binary digits using the BBP formulas for digit extraction.
This formula allows you to compute arbitrary binary digits without computing all the digits before it. So it is used to verify the last few computed binary digits. Therefore it is much faster than a full computation.
The advantage of this is:
Only one expensive computation is needed.
The disadvantage is:
An implementation of the Bailey–Borwein–Plouffe (BBP) formula is needed.
An additional step is needed to verify the radix conversion from binary to decimal.
I've glossed over some details of why verifying the last few digits implies that all the digits are correct. But it is easy to see this since any computation error will propagate to the last digits.
Now this last step (verifying the conversion) is actually fairly important. One of the previous world record holders actually called us out on this because, initially, I didn't give a sufficient description of how it worked.
So I've pulled this snippet from my blog:
N = # of decimal digits desired
p = 64-bit prime number
Compute A using base 10 arithmetic and B using binary arithmetic.
If A = B, then with "extremely high probability", the conversion is correct.
For further reading, see my blog post Pi - 5 Trillion Digits.

Undoubtedly, for your purposes (which I assume is just a programming exercise), the best thing is to check your results against any of the listings of the digits of pi on the web.
And how do we know that those values are correct? Well, I could say that there are computer-science-y ways to prove that an implementation of an algorithm is correct.
More pragmatically, if different people use different algorithms, and they all agree to (pick a number) a thousand (million, whatever) decimal places, that should give you a warm fuzzy feeling that they got it right.
Historically, William Shanks published pi to 707 decimal places in 1873. Poor guy, he made a mistake starting at the 528th decimal place.
Very interestingly, in 1995 an algorithm was published that had the property that would directly calculate the nth digit (base 16) of pi without having to calculate all the previous digits!
Finally, I hope your initial algorithm wasn't pi/4 = 1 - 1/3 + 1/5 - 1/7 + ... That may be the simplest to program, but it's also one of the slowest ways to do so. Check out the pi article on Wikipedia for faster approaches.

You could use multiple approaches and see if they converge to the same answer. Or grab some from the 'net. The Chudnovsky algorithm is usually used as a very fast method of calculating pi. http://www.craig-wood.com/nick/articles/pi-chudnovsky/

The Taylor series is one way to approximate pi. As noted it converges slowly.
The partial sums of the Taylor series can be shown to be within some multiplier of the next term away from the true value of pi.
Other means of approximating pi have similar ways to calculate the max error.
We know this because we can prove it mathematically.

You could try computing sin(pi/2) (or cos(pi/2) for that matter) using the (fairly) quickly converging power series for sin and cos. (Even better: use various doubling formulas to compute nearer x=0 for faster convergence.)
BTW, better than using series for tan(x) is, with computing say cos(x) as a black box (e.g. you could use taylor series as above) is to do root finding via Newton. There certainly are better algorithms out there, but if you don't want to verify tons of digits this should suffice (and it's not that tricky to implement, and you only need a bit of calculus to understand why it works.)

There is an algorithm for digit-wise evaluation of arctan, just to answer the question, pi = 4 arctan 1 :)

Related

fast algorithm for computing 1/d?(SRT, goldsmidt, newton raphson,...)

I want to find a fast algorithm for computing 1/d , where d is double ( albeit it can be converted to integer) what is the best algorithm of many algorithms(SRT , goldschmidt,newton raphson, ...)?I'm writing my program in c language.
thanks in advance.
The fastest program is: double result = 1 / d;
CPU:s already use a root finding iterative algorithm like the ones you describe, to find the reciprocal 1/d. So you should find it difficult to beat it using a software implementation of the same algorithm.
If you have few/known denominators then try a lookup table. This is the usual approach for even slower functions such as trig functions.
Otherwise: just compute 1/d. It will be the fastest you can do. And there is an endless list of things you can do to speed up arithmetic if you have to
use 32 bit (single) instead of 64bit (double) precision. FP Division on takes a number of cycles proportional to the number of bits.
vectorize the operations. For example I believe you can compute four 32bit float divisions in parallel with SSE2, or even more in parallel by doing it on the GPU.
I've asked it from someone and I get my answer:
So, you can't add a hardware divider to the FPGA then? Or fast reciprocal support?
Anyway it depends. Does it have fast multiplication? If not, well, that's a problem, you could only implement the slow methods then.
If you have fast multiplication and IEEE floats, you can use the weird trick I linked to in my previous post with a couple of refinement steps. That's really just Newton–Raphson division with a simpler calculation for the initial approximation (but afaik it still only takes 3 refinements for single-precision floats, just like the regular initial approximation). Fast reciprocal support works that way too - give a fast initial approximation (handling the exponent right and getting significant bits from a lookup table, if you get 12 significant bits that way you only need one refinement step for single-precision or, 13 are enough to get 2 steps for double-precision) and optionally have instructions that help implement the refinement step (like AMD's PFRCPIT1 and PFRCPIT2), for example to calculate Y = (1 - D*X) and to calculate X + X * Y.
Even without those tricks Newton–Raphson division is still not bad, with the linear approximation it takes only 4 refinements for double-precision floats, but it also takes some annoying exponent adjustments to get in the right range first (in hardware that wouldn't be half as annoying).
Goldschmidt division is, afaik, roughly equivalent in performance and might have a slightly less complex implementation. It's really the same sort of deal - trickery with the exponent to get in the right range, the "2 - something" estimation trick (which is rearranged in Newton-Raphson division, but it's really the same thing), and doing the refinement step until all the bits are right. It just looks a little different.

Programming Logic: Finding the smallest equation to a large number

I do not know a whole lot about math, so I don't know how to begin to google what I am looking for, so I rely on the intelligence of experts to help me understand what I am after...
I am trying to find the smallest string of equations for a particular large number. For example given the number
"39402006196394479212279040100143613805079739270465446667948293404245721771497210611414266254884915640806627990306816"
The smallest equation is 64^64 (that I know of) . It contains only 5 bytes.
Basically the program would reverse the math, instead of taking an expression and finding an answer, it takes an answer and finds the most simplistic expression. Simplistic is this case means smallest string, not really simple math.
Has this already been created? If so where can I find it? I am looking to take extremely HUGE numbers (10^10000000) and break them down to hopefully expressions that will be like 100 characters in length. Is this even possible? are modern CPUs/GPUs not capable of doing such big calculations?
Edit:
Ok. So finding the smallest equation takes WAY too much time, judging on answers. Is there anyway to bruteforce this and get the smallest found thus far?
For example given a number super super large. Sometimes taking the sqaureroot of number will result in an expression smaller than the number itself.
As far as what expressions it would start off it, well it would naturally try expressions that would the expression the smallest. I am sure there is tons of math things I dont know, but one of the ways to make a number a lot smaller is powers.
Just to throw another keyword in your Google hopper, see Kolmogorov Complexity. The Kolmogorov complexity of a string is the size of the smallest Turing machine that outputs the string, given an empty input. This is one way to formalize what you seem to be after. However, calculating the Kolmogorov complexity of a given string is known to be an undecidable problem :)
Hope this helps,
TJ
There's a good program to do that here:
http://mrob.com/pub/ries/index.html
I asked the question "what's the point of doing this", as I don't know if you're looking at this question from a mathemetics point of view, or a large number factoring point of view.
As other answers have considered the factoring point of view, I'll look at the maths angle. In particular, the problem you are describing is a compressibility problem. This is where you have a number, and want to describe it in the smallest algorithm. Highly random numbers have very poor compressibility, as to describe them you either have to write out all of the digits, or describe a deterministic algorithm which is only slightly smaller than the number itself.
There is currently no general mathemetical theorem which can determine if a representation of a number is the smallest possible for that number (although a lower bound can be discovered by understanding shannon's information theory). (I said general theorem, as special cases do exist).
As you said you don't know a whole lot of math, this is perhaps not a useful answer for you...
You're doing a form of lossless compression, and lossless compression doesn't work on random data. Suppose, to the contrary, that you had a way of compressing N-bit numbers into N-1-bit numbers. In that case, you'd have 2^N values to compress into 2^N-1 designations, which is an average of 2 values per designation, so your average designation couldn't be uncompressed. Lossless compression works well on relatively structured data, where data we're likely to get is compressed small, and data we aren't going to get actually grows some.
It's a little more complicated than that, since you're compressing partly by allowing more information per character. (There are a greater number of N-character sequences involving digits and operators than digits alone.) Still, you're not going to get lossless compression that, on the average, is better than just writing the whole numbers in binary.
It looks like you're basically wanting to do factoring on an arbitrarily large number. That is such a difficult problem that it actually serves as the cornerstone of modern-day cryptography.
This really appears to be a mathematics problem, and not programming or computer science problem. You should ask this on https://math.stackexchange.com/
While your question remains unclear, perhaps integer relation finding is what you are after.
EDIT:
There is some speculation that finding a "short" form is somehow related to the factoring problem. I don't believe that is true unless your definition requires a product as the answer. Consider the following pseudo-algorithm which is just sketch and for which no optimization is attempted.
If "shortest" is a well-defined concept, then in general you get "short" expressions by using small integers to large powers. If N is my integer, then I can find an integer nearby that is 0 mod 4. How close? Within +/- 2. I can find an integer within +/- 4 that is 0 mod 8. And so on. Now that's just the powers of 2. I can perform the same exercise with 3, 5, 7, etc. We can, for example, easily find the nearest integer that is simultaneously the product of powers of 2, 3, 5, 7, 11, 13, and 17, call it N_1. Now compute N-N_1, call it d_1. Maybe d_1 is "short". If so, then N_1 (expressed as power of the prime) + d_1 is the answer. If not, recurse to find a "short" expression for d_1.
We can also pick integers that are maybe farther away than our first choice; even though the difference d_1 is larger, it might have a shorter form.
The existence of an infinite number of primes means that there will always be numbers that cannot be simplified by factoring. What you're asking for is not possible, sorry.

Given a number series, finding the Check Digit Algorithm...?

Suppose I have a series of index numbers that consists of a check digit. If I have a fair enough sample (Say 250 sample index numbers), do I have a way to extract the algorithm that has been used to generate the check digit?
I think there should be a programmatic approach atleast to find a set of possible algorithms.
UPDATE: The length of a index number is 8 Digits including the check digit.
No, not in the general case, since the number of possible algorithms is far more than what you may think. A sample space of 250 may not be enough to do proper numerical analysis.
For an extreme example, let's say your samples are all 15 digits long. You would not be able to reliably detect the algorithm if it changed the behaviour for those greater than 15 characters.
If you wanted to be sure, you should reverse engineer the code that checks the numbers for validity (if available).
If you know that the algorithm is drawn from a smaller subset than "every possible algorithm", then it might be possible. But algorithms may be only half the story - there's also the case where multipliers, exponentiation and wrap-around points change even using the same algorithm.
paxdiablo is correct, and you can't guess the algorithm without making any other assumption (or just having the whole sample space - then you can define the algorithm by a look up table).
However, if the check digit is calculated using some linear formula dependent on the "data digits" (which is a very common case, as you can see in the wikipedia article), given enough samples you can use Euler elimination.

Create a function for given input and ouput

Imagine, there are two same-sized sets of numbers.
Is it possible, and how, to create a function an algorithm or a subroutine which exactly maps input items to output items? Like:
Input = 1, 2, 3, 4
Output = 2, 3, 4, 5
and the function would be:
f(x): return x + 1
And by "function" I mean something slightly more comlex than [1]:
f(x):
if x == 1: return 2
if x == 2: return 3
if x == 3: return 4
if x == 4: return 5
This would be be useful for creating special hash functions or function approximations.
Update:
What I try to ask is to find out is whether there is a way to compress that trivial mapping example from above [1].
Finding the shortest program that outputs some string (sequence, function etc.) is equivalent to finding its Kolmogorov complexity, which is undecidable.
If "impossible" is not a satisfying answer, you have to restrict your problem. In all appropriately restricted cases (polynomials, rational functions, linear recurrences) finding an optimal algorithm will be easy as long as you understand what you're doing. Examples:
polynomial - Lagrange interpolation
rational function - Pade approximation
boolean formula - Karnaugh map
approximate solution - regression, linear case: linear regression
general packing of data - data compression; some techniques, like run-length encoding, are lossless, some not.
In case of polynomial sequences, it often helps to consider the sequence bn=an+1-an; this reduces quadratic relation to linear one, and a linear one to a constant sequence etc. But there's no silver bullet. You might build some heuristics (e.g. Mathematica has FindSequenceFunction - check that page to get an impression of how complex this can get) using genetic algorithms, random guesses, checking many built-in sequences and their compositions and so on. No matter what, any such program - in theory - is infinitely distant from perfection due to undecidability of Kolmogorov complexity. In practice, you might get satisfactory results, but this requires a lot of man-years.
See also another SO question. You might also implement some wrapper to OEIS in your application.
Fields:
Mostly, the limits of what can be done are described in
complexity theory - describing what problems can be solved "fast", like finding shortest path in graph, and what cannot, like playing generalized version of checkers (they're EXPTIME-complete).
information theory - describing how much "information" is carried by a random variable. For example, take coin tossing. Normally, it takes 1 bit to encode the result, and n bits to encode n results (using a long 0-1 sequence). Suppose now that you have a biased coin that gives tails 90% of time. Then, it is possible to find another way of describing n results that on average gives much shorter sequence. The number of bits per tossing needed for optimal coding (less than 1 in that case!) is called entropy; the plot in that article shows how much information is carried (1 bit for 1/2-1/2, less than 1 for biased coin, 0 bits if the coin lands always on the same side).
algorithmic information theory - that attempts to join complexity theory and information theory. Kolmogorov complexity belongs here. You may consider a string "random" if it has large Kolmogorov complexity: aaaaaaaaaaaa is not a random string, f8a34olx probably is. So, a random string is incompressible (Volchan's What is a random sequence is a very readable introduction.). Chaitin's algorithmic information theory book is available for download. Quote: "[...] we construct an equation involving only whole numbers and addition, multiplication and exponentiation, with the property that if one varies a parameter and asks whether the number of solutions is finite or infinite, the answer to this question is indistinguishable from the result of independent tosses of a fair coin." (in other words no algorithm can guess that result with probability > 1/2). I haven't read that book however, so can't rate it.
Strongly related to information theory is coding theory, that describes error-correcting codes. Example result: it is possible to encode 4 bits to 7 bits such that it will be possible to detect and correct any single error, or detect two errors (Hamming(7,4)).
The "positive" side are:
symbolic algorithms for Lagrange interpolation and Pade approximation are a part of computer algebra/symbolic computation; von zur Gathen, Gerhard "Modern Computer Algebra" is a good reference.
data compresssion - here you'd better ask someone else for references :)
Ok, I don't understand your question, but I'm going to give it a shot.
If you only have 2 sets of numbers and you want to find f where y = f(x), then you can try curve-fitting to give you an approximate "map".
In this case, it's linear so curve-fitting would work. You could try different models to see which works best and choose based on minimizing an error metric.
Is this what you had in mind?
Here's another link to curve-fitting and an image from that article:
It seems to me that you want a hashtable. These are based in hash functions and there are known hash functions that work better than others depending on the expected input and desired output.
If what you want a algorithmic way of mapping arbitrary input to arbitrary output, this is not feasible in the general case, as it totally depends on the input and output set.
For example, in the trivial sample you have there, the function is immediately obvious, f(x): x+1. In others it may be very hard or even impossible to generate an exact function describing the mapping, you would have to approximate or just use directly a map.
In some cases (such as your example), linear regression or similar statistical models could find the relation between your input and output sets.
Doing this in the general case is arbitrarially difficult. For example, consider a block cipher used in ECB mode: It maps an input integer to an output integer, but - by design - deriving any general mapping from specific examples is infeasible. In fact, for a good cipher, even with the complete set of mappings between input and output blocks, you still couldn't determine how to calculate that mapping on a general basis.
Obviously, a cipher is an extreme example, but it serves to illustrate that there's no (known) general procedure for doing what you ask.
Discerning an underlying map from input and output data is exactly what Neural Nets are about! You have unknowingly stumbled across a great branch of research in computer science.

Recombine Number to Equal Math Formula

I've been thinking about a math/algorithm problem and would appreciate your input on how to solve it!
If I have a number (e.g. 479), I would like to recombine its digits or combination of them to a math formula that matches the original number. All digits should be used in their original order, but may be combined to numbers (hence, 479 allows for 4, 7, 9, 47, 79) but each digit may only be used once, so you can not have something like 4x47x9 as now the number 4 was used twice.
Now an example just to demonstrate on how I think of it. The example is mathematically incorrect because I couldn't come up with a good example that actually works, but it demonstrates input and expected output.
Example Input: 29485235
Example Output: 2x9+48/523^5
As I said, my example does not add up (2x9+48/523^5 doesn't result in 29485235) but I wondered if there is an algorithm that would actually allow me to find such a formula consisting of the source number's digits in their original order which would upon calculation yield the original number.
On the type of math used, I'd say parenthesis () and Add/Sub/Mul/Div/Pow/Sqrt.
Any ideas on how to do this? My thought was on simply brute forcing it by chopping the number apart by random and doing calculations hoping for a matching result. There's gotta be a better way though?
Edit: If it's any easier in non-original order, or you have an idea to solve this while ignoring some of the 'conditions' described above, it would still help tremendously to understand how to go about solving such a problem.
For numbers up to about 6 digits or so, I'd say brute-force it according to the following scheme:
1) Split your initial value into a list (array, whatever, according to language) of numbers. Initially, these are the digits.
2) For each pair of numbers, combine them together using one of the operators. If the result is the target number, then return success (and print out all the operations performed on your way out). Otherwise if it's an integer, recurse on the new, smaller list consisting of the number you just calculated, and the numbers you didn't use. Or you might want to allow non-integer intermediate results, which will make the search space somewhat bigger. The binary operations are:
Add
subtract
multiply
divide
power
concatenate (which may only be used on numbers which are either original digits, or have been produced by concatenation).
3) Allowing square root bloats the search space to infinity, since it's a unary operator. So you will need a way to limit the number of times it can be applied, and I'm not sure what that will be (loss of precision as the answer approaches 1, maybe?). This is another reason to allow only integer intermediate values.
4) Exponentiation will rapidly cause overflows. 2^(9^(4^8)) is far too large to store all the digits directly [although in base 2 it's pretty obvious what they are ;-)]. So you'll either have to accept that you might miss solutions with large intermediate values, or else you'll have to write a bunch of code to do your arithmetic in terms of factors. These obviously don't interact very well with addition, so you might have to do some estimation. For example, just by looking at the magnitude of the number of factors we see that 2^(9^(4^8)) is nowhere near (2^35), so there's no need to calculate (2^(9^(4^8)) + 5) / (2^35). It can't possibly be 29485235, even if it were an integer (which it certainly isn't - another way to rule out this particular example). I think handling these numbers is harder than the rest of the problem put together, so perhaps you should limit yourself to single-digit powers to begin with, and perhaps to results which fit in a 64bit integer, depending what language you are using.
5) I forgot to exclude the trivial solution for any input, of just concatenating all the digits. That's pretty easy to handle, though, just maintain a parameter through the recursion which tells you whether you have performed any non-concatenation operations on the route to your current sub-problem. If you haven't, then ignore the false match.
My estimate of 6 digits is based on the fact that it's fairly easy to write a Countdown solver that runs in a fraction of a second even when there's no solution. This problem is different in that the digits have to be used in order, but there are more operations (Countdown does not permit exponentiation, square root, or concatenation, or non-integer intermediate results). Overall I think this problem is comparable, provided you resolve the square root and overflow issues. If you can solve one case in a fraction of a second, then you can brute force your way through a million candidates in reasonable time (assuming you don't mind leaving your PC on).
By 10 digits, brute force appears impossible, because you have to consider 10 billion cases, each with a significant amount of recursion required. So I guess you'll hit the limit of brute force somewhere between the two.
Note also that my simple algorithm at the top still has a lot of redundancy - it doesn't stop you doing (4,7,9,1) -> (47,9,1) -> (47,91), and then later also doing (4,7,9,1) -> (4,7,91) -> (47,91). So unless you work out where those duplicates are going to occur and avoid them, you'll attempt (47,91) twice. Obviously that's not much work when there's only 2 numbers in the list, but when there are 7 numbers in the list, you probably do not want to e.g. add 4 of them together in 6 different ways and then solve the resulting 4-number problem 6 times. Cleverness here is not required for the Countdown game, but for all I know in this problem it might make the difference between brute-forcing 8 digits, and brute-forcing 9 digits, which is quite significant.
Numbers like that, as I recall, are exceedingly rare, if extant. Some numbers can be expressed by their component digits in a different order, such as, say, 25 (5²).
Also, trying to brute-force solutions is hopeless, at best, given that the number of permutations increase extremely rapidly as the numbers grow in digits.
EDIT: Partial solution.
A partial solution solving some cases would be to factorize the number into its prime factors. If its prime factors are all the same, and the exponent and factor are both present in the digits of the number (such as is the case with 25) you have a specific solution.
Most numbers that do fall into these kinds of patterns will do so either with multiplication or pow() as their major driving force; addition simply doesn't increase it enough.
Short of building a neural network that replicates Carol Voorderman I can't see anything short of brute force working - humans are quite smart at seeing patterns in problems such as this but encoding such insight is really tough.

Resources