Mathematica internal number formats and precision - wolfram-mathematica

Tangentially related to this question, what exactly is happening here with the number formatting?
In[1] := InputForm # 3.12987*10^-270
Out[1] := 3.12987`*^-270
In[2] := InputForm # 3.12987*10^-271
Out[2] := 3.1298700000000003`*^-271
If you use *10.^ as the multiplier the transition is where you would naively expect it to be:
In[3] := InputForm # 3.12987*10.^-16
Out[3] := 3.12987`*^-16
In[4] := InputForm # 3.12987*10.^-17
Out[4] := 3.1298700000000004`*^-17
whereas *^ takes the transition a bit further, albeit it is the machine precision that starts flaking out:
In[5] := InputForm # 3.12987*^-308
Out[5] := 3.12987`*^-308
In[6] := InputForm # 3.12987*10.^-309
Out[6] := 3.12987`15.954589770191008*^-309
The base starts breaking up only much later
In[7] := InputForm # 3.12987*^-595
Out[7] := 3.12987`15.954589770191005*^-595
In[8] := InputForm # 3.12987*^-596
Out[8] := 3.1298699999999999999999999999999999999999`15.954589770191005*^-596
I am assuming these transitions relate to the format in which Mathematica internally keeps it's numbers, but does anyone know, or care to hazard an educated guess at, how?

If I understand correctly you are wondering as to when the InputForm will show more than 6 digits. If so, it happens haphazardly, whenever more digits are required to "best" represent the number obtained after evaluation. Since the evaluation involves explicit multiplication by 10^(some power), and since the decimal input need not be (and in this case is not) exactly representable in binary, you can get small differences from what you expect.
In[26]:= Table[3.12987*10^-j, {j, 10, 25}] // InputForm
Out[26]//InputForm=
{3.12987*^-10,
3.12987*^-11,
3.12987*^-12,
3.12987*^-13,
3.12987*^-14,
3.12987*^-15,
3.12987*^-16,
3.1298700000000004*^-17,
3.1298700000000002*^-18,
3.12987*^-19,
3.12987*^-20,
3.1298699999999995*^-21,
3.1298700000000003*^-22,
3.1298700000000004*^-23,
3.1298700000000002*^-24,
3.1298699999999995*^-25}
As for the *^ input syntax, that's effectively a parsing (actually lexical) construct. No explicit exact power of 10 is computed. A floating point value is constructed and it is faithful as possible, to the extent allowed by binary-to-decimal, to your input. The InputForm will show as many digits as were used in inputting the number, because that is indeed the closest decimal to the corresponding binary value that got created.
When you surpass the limitations of machine floating point numbers, you get an arbitrary precision analog. It no longer is machinePrecision but actually is $MachinePrecision (that's the bignum analog to machine floats in Mathematica).
What you see in InputForm for 3.12987*^-596 (a decimal ending with a slew of 9's) is, I believe, caused by Mathematica's internal representation involving usage of guard bits. Were there only 53 mantissa bits, analogous to a machine double, then the closest decimal representation would be the expected six digits.
Daniel Lichtblau
Wolfram Research

Related

Float Arithmetic inconsistent between golang programs

When decoding audio files with pion/opus I will occasionally get values that are incorrect.
I have debugged it down to the following code. When this routine runs inside the Opus decoder I get a different value then when I run it outside? When the two floats are added together the right most bit is different. The difference in values eventually becomes a problem as the program runs longer.
Is this a bug or expected behavior? I don't know how to debug this deeper/dump state of my program to understand more.
Outside decoder
package main
import (
"fmt"
"math"
)
func main() {
a := math.Float32frombits(uint32(955684399))
b := math.Float32frombits(uint32(927295728))
fmt.Printf("%b\n", math.Float32bits(a))
fmt.Printf("%b\n", math.Float32bits(b))
fmt.Printf("%b\n", math.Float32bits(a+b))
}
Returns
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100110
Then Inside decoder
fmt.Printf("%b\n", math.Float32bits(lpcVal))
fmt.Printf("%b\n", math.Float32bits(val))
fmt.Printf("%b\n", math.Float32bits(lpcVal+val))
Returns
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100111
I guess that lpcval and val are not Float32 but rather Float64.
If that is the case, then you are proposing two different operations:
in the former case, you do Float32bits(lpcval) + Float32bits(val)
in the later case, you do Float32bits(lpcval + val)
the two 32 bits floats are in binary:
1.11101101001011000101111 * 2^-14
1.10001010110100011110000 * 2^-17
The exact sum is
1.000011110100001101001101 * 2^-13
which is an exact tie between two representable Float32
the result is rounded to the Float32 with even significand
1.00001111010000110100110 * 2^-13
But lpcval and val are Float64: instead of 23 bits after the floating point, they have 52 (19 more).
If a single bit among those 19 more bits is different from zero, the result might not be an exact tie, but slightly larger than the exact tie.
Once converted to nearest Float32, that will be
1.00001111010000110100111 * 2^-13
Since we have no idea of what lpcval and val contains in those low significant bits, anything can happen, even without the use of fma operations.
This was happening because of Fused multiply and add. Multiple floating point operations were becoming combined into one operation.
You can read more about it in the Go Language Spec#Floating_Point_Operators
The change I made to my code was
- lpcVal += currentLPCVal * (aQ12 / 4096.0)
+ lpcVal = float32(lpcVal) + float32(currentLPCVal)*float32(aQ12)/float32(4096.0)
Thank you to Bryan C. Mills for answering this on the #performance channel on the Gophers slack.

Why is floating point addition imprecise? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Why are floating point numbers inaccurate?
(5 answers)
Closed 3 years ago.
I found the following strange behavior. Adding some floats result in "random" accuracy.
so first I run
go version go1.12 darwin/amd64
on macOS Mojave (10.14.3) with Intel i7 2,6 Ghz
the behavior occur in the following Example:
func TestFloatingAddition(t *testing.T) {
f1 := float64(5)
f2 := float64(12.1)
f5 := float64(-12.1)
f3 := f1 + f2 // 17.1
f4 := f3 + f5 // 5.000000000000002
if f4 != f1 {
t.Fatal("addition is not reversable")
}
}
Can someone explain to me why f4 takes on this strange value, and what can I do to fix it?
This is not a problem with Go-lang (or C, C++, Java, Python, Ruby), or any modern language which uses IEEE-754 floating point to represent floating point numbers. There are numbers that are not exactly representable using binary or decimal based floating point storage formats.
Floating point numbers are represented using IEEE-754 floating point as three parts, a sign bit, mantissa, and exponent (using exponent bias, offset by 127, 1023, etc, subtracted from exponent). The mantissa is encoded as a binary sequence, and essentially left/right shifted exponent bits to form binary fractions. And with binary fractions lies the problem.
In the same way that the fraction 1/3 is 'relatively prime' in base-10, and cannot be exactly represented in decimal, certain numbers cannot be expressed exactly using binary fractions. Decimal numbers are 'relatively prime' to binary numbers, since 10=2*5 has the factor 5. You cannot express 1/5 exactly as binary fraction, just as the fractions 1/3, 1/7, 1/11, 1/13, 1/17, etc (notice the pattern of prime numbers here?) cannot be expressed exactly in either decimal or binary fractions. The internal representation will always approximate these numbers, and some string conversion libraries use conversions to reduce the approximation error.
What can you do? If you are using only linear arithmetic operators, you could use fixed-point decimal libraries (that is what [shudder] Cobol does).
Some libraries store fractional numbers as ratios of two whole numbers, but this does not solve the problem when you introduce functions such as square root which can produce irrational numbers.
A Possible solution could be:
const float64EqThreshold = 1e-9
func equalEnought(a,b float64) bool {
return math.Abs(a - b) <= float64EqThreshold
}
so the function would look like
func TestFloatingAddition(t *testing.T) {
f1 := float64(5)
f2 := float64(12.1)
f5 := float64(-12.1)
f3 := f1 + f2 // 17.1
f4 := f3 + f5 // 5.000000000000002
if equalEnought(f1, f4){
t.Fatal("addition is not reversable")
}
}

How to parse long hexadecimal string into uint

I'm pulling in data that is in long hexadecimal string form which I need to convert into decimal notation, truncate 18 decimal places, and then serve up in JSON.
For example I may have the hex string:
"0x00000000000000000000000000000000000000000000d3c21bcecceda1000000"
At first I was attempting to use ParseUint(), however since the highest it supports is int64, my number ends up being way too big.
This example after conversion and truncation results in 10^6.
However there are instances where this number can be up to 10^12 (meaning pre truncation 10^30!).
What is the best strategy to attack this?
Use math/big for working with numbers larger than 64 bits.
From the Int.SetString example:
s := "d3c21bcecceda1000000"
i := new(big.Int)
i.SetString(s, 16)
fmt.Println(i)
https://play.golang.org/p/vf31ce93vA
The math/big types also support the encoding.TextMarshaler and fmt.Scanner interfaces.
For example
i := new(big.Int)
fmt.Sscan("0x000000d3c21bcecceda1000000", i)
Or
i := new(big.Int)
fmt.Sscanf("0x000000d3c21bcecceda1000000", "0x%x", i)

a very strange question in mathematica

I am doing this in mma v7.0:
r[x_] := Rationalize[x, 0];
N[Nest[Sqrt, 10., 53] // r, 500]
It gave me
1.000000000000000222044604925031308084726333618164062500000000000000000
However, if I go one step further
N[Nest[Sqrt, 10., 54] // r, 500]
I got all zeros. Does anybody know an explanation, or it is a bug?
Besides, looks like this way to produce more digits from the solution Nest[Sqrt, 10., 53] is not working very well. How to obtain more significant digits for this calculation?
Many thanks.
Edit
If I did Nest[Sqrt, 10., 50], I still got a lot of significant digits.
You have no significant digits other than zeros if you do this 54 times. Hence rationalizing as you do (which simply preserves bit pattern) gives what you saw.
InputForm[n53 = Nest[Sqrt, 10., 53]]
Out[180]//InputForm=
1.0000000000000002
InputForm[n54 = Nest[Sqrt, 10., 54]]
Out[181]//InputForm=
1.
Rationalize[n53, 0]
4503599627370497/4503599627370496
Rationalize[n54, 0]
Out[183]= 1
For the curious: the issue is not loss of precision in the sense of degradation with iterations computation. Indeed, iterating these square roots actually increases precision. We can see this with bignum input.
InputForm[n54 = Nest[Sqrt, 10.`20, 54]]
Out[188]//InputForm=
1.0000000000000001278191493200323453724568038240908339267044`36.25561976585499
Here is the actual problem. When we use machine numbers then after 54 iterations there are no significant digits other than zeros in the resulting machine double. That is to say, our size restriction on the numbers is the cause.
The reason is not too mysterious. Call the resulting value 1+eps. Then we have (1+eps)^(2^54) equal (to close approximation) to 10. A second order expansion then shows eps must be smaller than machine epsilon.
InputForm[epsval =
First[Select[
eps /. N[Solve[Sum[eps^j*Binomial[2^54, j], {j, 2}] == 9, eps]],
Head[#] === Real && # > 0 &]]]
Out[237]//InputForm=
1.864563472253985*^-16
$MachineEpsilon
Out[235]= 2.22045*10^-16
Daniel Lichtblau
Wolfram Research
InputForm /# NestList[Sqrt, 10., 54]
10.
3.1622776601683795
1.7782794100389228
1.333521432163324
1.1547819846894583
1.0746078283213176
1.036632928437698
1.018151721718182
1.0090350448414476
1.0045073642544626
1.002251148292913
1.00112494139988
1.0005623126022087
1.00028111678778
1.0001405485169472
1.0000702717894114
1.000035135277462
1.0000175674844227
1.0000087837036347
1.0000043918421733
1.0000021959186756
1.000001097958735
1.0000005489792168
1.0000002744895706
1.000000137244776
1.0000000686223856
1.000000034311192
1.0000000171555958
1.0000000085777978
1.0000000042888988
1.0000000021444493
1.0000000010722245
1.0000000005361123
1.0000000002680562
1.0000000001340281
1.000000000067014
1.000000000033507
1.0000000000167535
1.0000000000083769
1.0000000000041884
1.0000000000020943
1.0000000000010472
1.0000000000005236
1.0000000000002618
1.000000000000131
1.0000000000000655
1.0000000000000329
1.0000000000000164
1.0000000000000082
1.000000000000004
1.000000000000002
1.0000000000000009
1.0000000000000004
1.0000000000000002
1.
Throwing N[x, 500] on this is like trying to squeeze water from a rock.
The calculations above are done in machine precision, which is very fast. If you are willing to give up speed, you can utilize Mathematica's arbitrary precision arithmetic by specifying a non-machine precision on the input values. The "backtick" can be used to do this (as in the example below) or you can use SetPrecision or SetAccuracy. Here I will specify that the input is the number 10 up to 20 digits of precision.
NestList[Sqrt, 10`20, 54]
10.000000000000000000
3.1622776601683793320
1.77827941003892280123
.
.
.
1.00000000000000051127659728012947952
1.00000000000000025563829864006470708
1.000000000000000127819149320032345372
As you can see you do not need to use InputForm as Mathematica will automatically print arbitrary-precision numbers to as many places as it accurately can.
If you do use InputForm or FullForm you will see a backtick and then a number, which is the current precision of that number.

Show a number with specified number of significant digits

I use the following function to convert a number to a string for display purposes (don't use scientific notation, don't use a trailing dot, round as specified):
(* Show Number. Convert to string w/ no trailing dot. Round to the nearest r. *)
Unprotect[Round]; Round[x_,0] := x; Protect[Round];
shn[x_, r_:0] := StringReplace[
ToString#NumberForm[Round[N#x,r], ExponentFunction->(Null&)], re#"\\.$"->""]
(Note that re is an alias for RegularExpression.)
That's been serving me well for years.
But sometimes I don't want to specify the number of digits to round to, rather I want to specify a number of significant figures.
For example, 123.456 should display as 123.5 but 0.00123456 should display as 0.001235.
To get really fancy, I might want to specify significant digits both before and after the decimal point.
For example, I might want .789 to display as 0.8 but 789.0 to display as 789 rather than 800.
Do you have a handy utility function for this sort of thing, or suggestions for generalizing my function above?
Related: Suppressing a trailing "." in numerical output from Mathematica
UPDATE: I tried asking a general version of this question here:
https://stackoverflow.com/questions/5627185/displaying-numbers-to-non-technical-users
dreeves, I think I finally understand what you want, and you already had it, pretty much. If not, please try again to explain what I am missing.
shn2[x_, r_: 0] :=
StringReplace[
ToString#NumberForm[x, r, ExponentFunction -> (Null &)],
RegularExpression#"\\.0*$" -> ""]
Testing:
shn2[#, 4] & /# {123.456, 0.00123456}
shn2[#, {3, 1}] & /# {789.0, 0.789}
shn2[#, {10, 2}] & /# {0.1234, 1234.}
shn2[#, {4, 1}] & /# {12.34, 1234.56}
Out[1]= {"123.5", "0.001235"}
Out[2]= {"789", "0.8"}
Out[3]= {"0.12", "1234"}
Out[4]= {"12.3", "1235"}
This may not be the complete answer (you need to convert from/to string), but this function takes arguments a number x and significant figures sig wanted. The number of digits it keeps is the maximum of sig or the number of digits to the left of the decimal.
A[x_,sig_]:=NumberForm[x, Max[Last[RealDigits[x]], sig]]
RealDigits
Here's a possible generalization of my original function.
(I've determined that it's not equivalent to Mr Wizard's solution but I'm not sure yet which I think is better.)
re = RegularExpression;
(* Show Number. Convert to string w/ no trailing dot. Use at most d significant
figures after the decimal point. Target t significant figures total (clipped
to be at least i and at most i+d, where i is the number of digits in integer
part of x). *)
shn[x_, d_:5, t_:16] := ToString[x]
shn[x_?NumericQ, d_:5, t_:16] := With[{i= IntegerLength#IntegerPart#x},
StringReplace[ToString#NumberForm[N#x, Clip[t, {i,i+d}],
ExponentFunction->(Null&)],
re#"\\.$"->""]]
Testing:
Here we specify 4 significant digits, but never dropping any to the left of the decimal point and never using more than 2 significant digits to the right of the decimal point.
(# -> shn[#, 2, 4])& /#
{123456, 1234.4567, 123.456, 12.345, 1.234, 1.0001, 0.123, .0001234}
{ 123456 -> "123456",
1234.456 -> "1234",
123.456 -> "123.5"
12.345 -> "12.35",
1.234 -> "1.23",
1.0001 -> "1",
0.123 -> "0.12",
0.0001234 -> "0.00012" }

Resources