How to take the exponent of math/big.Float in Go? - go

I couldn't find anything in the API. Converting the number to a math/big.Int and back is not an option because the fractional component is significant to my calculation.
I'll end up repeatedly multiplying if there's no API, but that's a dissatisfying solution (math/big.Int.Exp is just O(log(n))) which might not be practical when I run into this problem again.
Thanks!

You may use MantExp() to take the exponent of a big.Float for a particular base/mantissa. Note that the formula for calculating the exponent for a given mantissa is:
x == mant × 2**exp

Related

Power of two to get an integer

I need a (fairly) fast way to get the following for my code.
Background: I have to work with powers of numbers and their product, so I decided to use logs.
Now I need a way to convert the log back to an integer.
I can't just take 2^log_val (I'm working with log base 2) because the answer will be too large. In fact i need to give the answer mod M for given M.
I tried doing this. I wrote log_val as p+q, where q is a float, q < 1 and p is an integer.
Now i can calculate 2^p very fast using log n exponentiation along with the modulo, but i can't do anything with the 2^q. What I thought of doing is finding the first integral power of 2, say x, such that 2^(x+q) is very close to an integer, and then calculate 2^p-x.
This is too long for me because in the worst case I'll take O(p) steps.
Is there a better way?
While working with large numbers as logs is usually a good approach, it won't work here. The issue is that working in log space throws away the least significant digits, thus you have lost information, and won't be able to go back. Working in mod space will also throw away information (otherwise your number gets to big, as you say), but it throws away the most significant ones instead.
For your particular problem POWERMUL, what I would do is to calculate the prime factorizations of the numbers from 1 to N. You have to be careful how you do it, since your N is fairly large.
Now, if your number is k with the prime factorization {2: 3, 5: 2} you get the factorization of k^m by {2: m*3, 5:m*2}. Division similarly turns into subtraction.
Once you have the prime factorization representation of f(N)/(f(r)*f(N-r)) you can recreate the integer with a combination of modular multiplication and exponentiation. The later is a cool technique to look up. (In fact languages like python has it built in with pow(3, 16, 7)=4.
Have fun :)
If you need an answer mod N, you can often do each step of your whole calculation mod N. That way, you never exceed your system's integer size restrictions.

How do pocket calculators simplify fractions and keep imprecise numbers as fractions?

Could someone explain how calculators (such as casio pocket ones) manage equations such as '500/12' and are able to return '125/3' as the result, alternately can someone name some algorithms which do this?
By imprecise numbers I mean numbers which cannot be represented in a fixed number of decimal places, such as 0.333 recurring.
Windows calculator is able to demonstrate this, if you perform '1/3' you will get '0.3333333333333333' as the answer, but then if you multiply this by 3 you will arrive back at '1'.
My HP's fraction display let's you set several modes for fraction display:
Set a maximum denominator. The displayed the fraction is n/d closest to the internal floating point value without d exceeding the maximum. For example, if the maximum is set to 10, the floating point number for pi is nearest the fraction 22/7. However, if the maximum is 1000, then the nearest fraction is 355/113.
Set an exact denominator and reduce the result. The displayed fraction is the n/d closest to the internal floating point value where d is equal to the exact denominator. Having computed n, the fraction is then reduced by the greatest-common-denominator. For example, if the denominator is fixed to be 32, then the floating point number 0.51 is nearest to 16/32 which gets reduced to 1/2. Likewise, the floating point number 0.516 is nearest to 17/32 which is irreducible.
Set an exact denominator and do not reduce the result. For example, 0.51 is shown as 16/32, an unreduced fraction.
The algorithm for the maximum-denominator approach uses continued fractions. An easy to follow example in Python can be found in the limit_denominator method at http://hg.python.org/cpython/file/2.7/Lib/fractions.py#l206 .
The method for the exact-denominator approach is easier. Given a denominator d and a floating point number x, the numerator is just d * x rounded to the nearest integer. Then reduce the fraction n/d by computing the greatest common divisor.
Optionally, the original floating point number can be replaced by the displayed fraction. This is known as a snap-to-grid. That way, you can enter 0.333 to create a fraction that is exactly equal to 1/3. This lets you do exact fractional arithmetic without round-off.
Hope this answer clears everything up for you :-) Let me know if any part needs elaboration or further explanation.
I'd suggest you look at the GMP library's rational number functions. At some point, you will need to accept finite precision in your calculations, unless the sequence of operations is particularly simple. The irrationals (transcendental functions / constants) can only be approximated, e.g., as continued fractions.

Inverse of number in binary

Suppose we have some arbitrary positive number x.
Is there a method to represent its inverse in binary or x's inverse is 1/x - how does one express that in binary?
e.g. x=5 //101
x's inverse is 1/x, it's binary form is ...?
You'd find it the same way you would in decimal form: long division.
There is no shortcut just because you are in another base, although long division is significantly simpler.
Here is a very nice explanation of long division applied to binary numbers.
Although, just to let you know, most floating-point systems on today's machines do very fast division for you.
In general, the only practical way to "express in binary" an arbitrary fraction is as a pair of integers, numerator and denominator -- "floating point", the most commonly used (and hardware supported) binary representation of non-integer numbers, can represent exactly on those fractions whose denominator (when the fraction is reduced to the minimum terms) is a power of two (and, of course, only when the fixed number of bits allotted to the representation is sufficient for the number we'd like to represent -- but, the latter limitation will also hold for any fixed-size binary representation, including the simplest ones such as integers).
0.125 = 0.001b
0.0625 = 0.0001b
0.0078125 = 0.0000001b
0.00390625 = 0.00000001b
0.00048828125 = 0.00000000001b
0.000244140625 = 0.000000000001b
----------------------------------
0.199951171875 = 0.001100110011b
Knock yourself out if you want higher accuracy/precision.
Another form of multiplicative inverse takes advantage of the modulo nature of integer arithmetic as implemented on most computers; in your case the 32 bit value
11001100110011001100110011001101 (-858993459 signed int32 or 3435973837 unsigned int32) when multiplied by 5 equals 1 (mod 4294967296). Only values which are coprime with the power of two the modulo operates on have such multiplicative inverses.
If you just need the first few bits of a binary fraction number, this trick will give you those bits: (2 << 31) / x. But don't use this trick on any real software project. (because it is rough, inaccurate and plainly wrong way to represent the value)

Why is Math.sqrt(i*i).floor == i?

I am wondering if this is true: When I take the square root of a squared integer, like in
f = Math.sqrt(123*123)
I will get a floating point number very close to 123. Due to floating point representation precision, this could be something like 122.99999999999999999999 or 123.000000000000000000001.
Since floor(122.999999999999999999) is 122, I should get 122 instead of 123. So I expect that floor(sqrt(i*i)) == i-1 in about 50% of the cases. Strangely, for all the numbers I have tested, floor(sqrt(i*i) == i. Here is a small ruby script to test the first 100 million numbers:
100_000_000.times do |i|
puts i if Math.sqrt(i*i).floor != i
end
The above script never prints anything. Why is that so?
UPDATE: Thanks for the quick reply, this seems to be the solution: According to wikipedia
Any integer with absolute value less
than or equal to 2^24 can be exactly
represented in the single precision
format, and any integer with absolute
value less than or equal to 2^53 can
be exactly represented in the double
precision format.
Math.sqrt(i*i) starts to behave as I've expected it starting from i=9007199254740993, which is 2^53 + 1.
Here's the essence of your confusion:
Due to floating point representation
precision, this could be something
like 122.99999999999999999999 or
123.000000000000000000001.
This is false. It will always be exactly 123 on a IEEE-754 compliant system, which is almost all systems in these modern times. Floating-point arithmetic does not have "random error" or "noise". It has precise, deterministic rounding, and many simple computations (like this one) do not incur any rounding at all.
123 is exactly representable in floating-point, and so is 123*123 (so are all modest-sized integers). So no rounding error occurs when you convert 123*123 to a floating-point type. The result is exactly 15129.
Square root is a correctly rounded operation, per the IEEE-754 standard. This means that if there is an exact answer, the square root function is required to produce it. Since you are taking the square root of exactly 15129, which is exactly 123, that's exactly the result you get from the square root function. No rounding or approximation occurs.
Now, for how large of an integer will this be true?
Double precision can exactly represent all integers up to 2^53. So as long as i*i is less than 2^53, no rounding will occur in your computation, and the result will be exact for that reason. This means that for all i smaller than 94906265, we know the computation will be exact.
But you tried i larger than that! What's happening?
For the largest i that you tried, i*i is just barely larger than 2^53 (1.1102... * 2^53, actually). Because conversions from integer to double (or multiplication in double) are also correctly rounded operations, i*i will be the representable value closest to the exact square of i. In this case, since i*i is 54 bits wide, the rounding will happen in the very lowest bit. Thus we know that:
i*i as a double = the exact value of i*i + rounding
where rounding is either -1,0, or 1. If rounding is zero, then the square is exact, so the square root is exact, so we already know you get the right answer. Let's ignore that case.
So now we're looking at the square root of i*i +/- 1. Using a Taylor series expansion, the infinitely precise (unrounded) value of this square root is:
i * (1 +/- 1/(2i^2) + O(1/i^4))
Now this is a bit fiddly to see if you haven't done any floating point error analysis before, but if you use the fact that i^2 > 2^53, you can see that the:
1/(2i^2) + O(1/i^4)
term is smaller than 2^-54, which means that (since square root is correctly rounded, and hence its rounding error must be smaller than 2^54), the rounded result of the sqrt function is exactly i.
It turns out that (with a similar analysis), for any exactly representable floating point number x, sqrt(x*x) is exactly x (assuming that the intermediate computation of x*x doesn't over- or underflow), so the only way you can encounter rounding for this type of computation is in the representation of x itself, which is why you see it starting at 2^53 + 1 (the smallest unrepresentable integer).
For "small" integers, there is usually an exact floating-point representation.
It's not too hard to find cases where this breaks down as you'd expect:
Math.sqrt(94949493295293425**2).floor
# => 94949493295293424
Math.sqrt(94949493295293426**2).floor
# => 94949493295293424
Math.sqrt(94949493295293427**2).floor
# => 94949493295293424
Ruby's Float is a double-precision floating point number, which means that it can accurately represent numbers with (rule of thumb) about 16 significant decimal digits. For regular single-precision floating point numbers it's about significant 7 digits.
You can find more information here:
What Every Computer Scientist Should Know About Floating-Point Arithmetic:
http://docs.sun.com/source/819-3693/ncg_goldberg.html

Best algorithm for avoiding loss of precision?

A recent homework assignment I have received asks us to take expressions which could create a loss of precision when performed in the computer, and alter them so that this loss is avoided.
Unfortunately, the directions for doing this haven't been made very clear. From watching various examples being performed, I know that there are certain methods of doing this: using Taylor series, using conjugates if square roots are involved, or finding a common denominator when two fractions are being subtracted.
However, I'm having some trouble noticing exactly when loss of precision is going to occur. So far the only thing I know for certain is that when you subtract two numbers that are close to being the same, loss of precision occurs since high order digits are significant, and you lose those from round off.
My question is what are some other common situations I should be looking for, and what are considered 'good' methods of approaching them?
For example, here is one problem:
f(x) = tan(x) − sin(x) when x ~ 0
What is the best and worst algorithm for evaluating this out of these three choices:
(a) (1/ cos(x) − 1) sin(x),
(b) (x^3)/2
(c) tan(x)*(sin(x)^2)/(cos(x) + 1).
I understand that when x is close to zero, tan(x) and sin(x) are nearly the same. I don't understand how or why any of these algorithms are better or worse for solving the problem.
Another rule of thumb usually used is this: When adding a long series of numbers, start adding from numbers closest to zero and end with the biggest numbers.
Explaining why this is good is abit tricky. when you're adding small numbers to a large numbers, there is a chance they will be completely discarded because they are smaller than then lowest digit in the current mantissa of a large number. take for instance this situation:
a = 1,000,000;
do 100,000,000 time:
a += 0.01;
if 0.01 is smaller than the lowest mantissa digit, then the loop does nothing and the end result is a == 1,000,000
but if you do this like this:
a = 0;
do 100,000,000 time:
a += 0.01;
a += 1,000,000;
Than the low number slowly grow and you're more likely to end up with something close to a == 2,000,000 which is the right answer.
This is ofcourse an extreme example but I hope you get the idea.
I had to take a numerics class back when I was an undergrad, and it was thoroughly painful. Anyhow, IEEE 754 is the floating point standard typically implemented by modern CPUs. It's useful to understand the basics of it, as this gives you a lot of intuition about what not to do. The simplified explanation of it is that computers store floating point numbers in something like base-2 scientific notation with a fixed number of digits (bits) for the exponent and for the mantissa. This means that the larger the absolute value of a number, the less precisely it can be represented. For 32-bit floats in IEEE 754, half of the possible bit patterns represent between -1 and 1, even though numbers up to about 10^38 are representable with a 32-bit float. For values larger than 2^24 (approximately 16.7 million) a 32-bit float cannot represent all integers exactly.
What this means for you is that you generally want to avoid the following:
Having intermediate values be large when the final answer is expected to be small.
Adding/subtracting small numbers to/from large numbers. For example, if you wrote something like:
for(float index = 17000000; index < 17000001; index++) {}
This loop would never terminate becuase 17,000,000 + 1 is rounded down to 17,000,000.
If you had something like:
float foo = 10000000 - 10000000.0001
The value for foo would be 0, not -0.0001, due to rounding error.
My question is what are some other
common situations I should be looking
for, and what are considered 'good'
methods of approaching them?
There are several ways you can have severe or even catastrophic loss of precision.
The most important reason is that floating-point numbers have a limited number of digits, e.g..doubles have 53 bits. That means if you have "useless" digits which are not part of the solution but must be stored, you lose precision.
For example (We are using decimal types for demonstration):
2.598765000000000000000000000100 -
2.598765000000000000000000000099
The interesting part is the 100-99 = 1 answer. As 2.598765 is equal in both cases, it
does not change the result, but waste 8 digits. Much worse, because the computer doesn't
know that the digits is useless, it is forced to store it and crams 21 zeroes after it,
wasting at all 29 digits. Unfortunately there is no way to circumvent it for differences,
but there are other cases, e.g. exp(x)-1 which is a function occuring very often in physics.
The exp function near 0 is almost linear, but it enforces a 1 as leading digit. So with 12
significant digits
exp(0.001)-1 = 1.00100050017 - 1 = 1.00050017e-3
If we use instead a function expm1(), use the taylor series:
1 + x +x^2/2 +x^3/6 ... -1 =
x +x^2/2 +x^3/6 =: expm1(x)
expm1(0.001) = 1.00500166667e-3
Much better.
The second problem are functions with a very steep slope like tangent of x near pi/2.
tan(11) has a slope of 50000 which means that any small deviation caused by rounding errors
before will be amplified by the factor 50000 ! Or you have singularities if e.g. the result approaches 0/0, that means it can have any value.
In both cases you create a substitute function, simplying the original function. It is of no use to highlight the different solution approaches because without training you will simply not "see" the problem in the first place.
A very good book to learn and train: Forman S. Acton: Real Computing made real
Another thing to avoid is subtracting numbers that are nearly equal, as this can also lead to increased sensitivity to roundoff error. For values near 0, cos(x) will be close to 1, so 1/cos(x) - 1 is one of those subtractions that you'd like to avoid if possible, so I would say that (a) should be avoided.

Resources