Why do these two float64s have different values? - go

Consider these two cases:
fmt.Println(912 * 0.01)
fmt.Println(float64(912) * 0.01)
(Go Playground link)
The second one prints 9.120000000000001, which is actually fine, I understand why that is happening.
However, why does the first line print 9.12, without the …01 at the end? Does Go multiply the two untyped constants and simply replace them with a 9.12 literal when compiling?

As per spec:
Constant expressions are always evaluated exactly; intermediate values and the constants themselves may require precision significantly larger than supported by any predeclared type in the language.
Since
912 * 0.01
is a constant expression, it is evaluated exactly. Thus, writing fmt.Println(912 * 0.01) has the same effect as writing fmt.Println(9.12). When you pin 912 to float64, the other operand of the floating-point multiplication is implicitly pinned to float64, too. Thus, the expression float64(912) * 0.01 behaves like float64(912) * float64(0.01). 0.01 is not exactly representable in a float64, thus precision is lost in a different place than in the expression float64(912 * 0.01) which arises in the argument of fmt.Println() in your first example, explaining the different results.

The reason for the different output is that in the first case 912 * 0.01 is the multiplication of 2 untyped constant values which is of arbitrary precision, and only the result is converted to float64 when the value is passed to Println(). (See Constant expressions section of the Language specification for details.)
In the second case float64(912) * 0.01 first 912 is converted to float64, then the untyped constant 0.01 is converted to float64 and these two values having float64 are multiplied which is not an arbitrary precision, and will not give exact result.
Note:
In the first case the result will be converted to float64 when passed to the Println():
fmt.Printf("%T %v\n", 912 * 0.01, 912 * 0.01)
Output:
float64 9.12
Test it on Go Playground

Related

Misunderstanding Go Language specification on floating-point rounding

The Go language specification on the section about Constant expressions states:
A compiler may use rounding while computing untyped floating-point or complex constant expressions; see the implementation restriction in the section on constants. This rounding may cause a floating-point constant expression to be invalid in an integer context, even if it would be integral when calculated using infinite precision, and vice versa.
Does the sentence
This rounding may cause a floating-point constant expression to be invalid in an integer context
point to something like the following:
func main() {
a := 853784574674.23846278367
fmt.Println(int8(a)) // output: 0
}
The quoted part from the spec does not apply to your example, as a is not a constant expression but a variable, so int8(a) is converting a non-constant expression. This conversion is covered by Spec: Conversions, Conversions between numeric types:
When converting a floating-point number to an integer, the fraction is discarded (truncation towards zero).
[...] In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent.
Since you convert a non-constant expression a being 853784574674.23846278367 to an integer, the fraction part is discarded, and since the result does not fit into an int8, the result is not specified, it's implementation-dependent.
The quoted part means that while constants are represented with a lot higher precision than the builtin types (eg. float64 or int64), the precision that a compiler (have to) implement is not infinite (for practical reasons), and even if a floating point literal is representable precisely, performing operations on them may be carried out with intermediate roundings and may not give mathematically correct result.
The spec includes the minimum supportable precision:
Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
Represent integer constants with at least 256 bits.
Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
Give an error if unable to represent an integer constant precisely.
Give an error if unable to represent a floating-point or complex constant due to overflow.
Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
For example:
const (
x = 1e100000 + 1
y = 1e100000
)
func main() {
fmt.Println(x - y)
}
This code should output 1 as x is being 1 larger than y. Running it on the Go Playground outputs 0 because the constant expression x - y is executed with roundings, and the +1 is lost as a result. Both x and y are integers (have no fraction part), so in integer context the result should be 1. But the number being 1e100000, representing it requires around ~333000 bits, which is not a valid requirement from a compiler (according to the spec, 256 bit mantissa is sufficient).
If we lower the constants, we get correct result:
const (
x = 1e1000 + 1
y = 1e1000
)
func main() {
fmt.Println(x - y)
}
This outputs the mathematically correct 1 result. Try it on the Go Playground. Representing the number 1e1000 requires around ~3333 bits which seems to be supported (and it's way above the minimum 256 bit requirement).
An int8 is a signed integer, and can have a value from -128 to 127. That's why you are seeing unexpected value with int8(a) conversion.

Why is float32 more accurate than float64 in this case?

If I run the following piece of Go code:
fmt.Println(float32(0.1) + float32(0.2))
fmt.Println(float64(0.1) + float64(0.2))
the output is:
0.3
0.30000000000000004
It appears the result of the float32 sum is more exact than the result of the float64 sum, why? I thought that float64 is always more precise than float32. How do I decide which one to pick to have the most accurate result?
It isn't. fmt.Println is just making it look more precise. Println uses %g for floating point and complex numbers. The docs say...
The default precision for... %g it is the smallest number of digits necessary to identify the value uniquely.
0.3 is sufficient to identify a float32. But float64 being much more precise needs more digits.
We can use fmt.Printf and %0.20g to force both numbers to display the same precision.
f32 := float32(0.1) + float32(0.2)
f64 := float64(0.1) + float64(0.2)
fmt.Printf("%0.20g\n", f32)
fmt.Printf("%0.20g\n", f64)
0.30000001192092895508
0.30000000000000004441
float64 is more precise. Neither are exact as that is the nature of floating point numbers.
We can use strconv.FormatFloat to see what these numbers really are.
fmt.Println(strconv.FormatFloat(float64(f32), 'b', -1, 32))
fmt.Println(strconv.FormatFloat(f64, 'b', -1, 64))
10066330p-25
5404319552844596p-54
That is 10066330 * 2^-25 and 5404319552844596 * 2^-54.

Strange loss of precision multiplying big.Float

If you parse a string into a big.Float like f.SetString("0.001"), then multiply it, I'm seeing a loss of precision. If I use f.SetFloat64(0.001), I don't lose precision. Even doing a strconv.ParseFloat("0.001", 64), then calling f.SetFloat() works.
Full example of what I'm seeing here:
https://play.golang.org/p/_AyTHJJBUeL
Expanded from this question: https://stackoverflow.com/a/47546136/105562
The difference in output is due to imprecise representation of base 10 floating point numbers in float64 (IEEE-754 format) and the default precision and rounding of big.Float.
See this simple code to verify:
fmt.Printf("%.30f\n", 0.001)
f, ok := new(big.Float).SetString("0.001")
fmt.Println(f.Prec(), ok)
Output of the above (try it on the Go Playground):
0.001000000000000000020816681712
64 true
So what we see is that the float64 value 0.001 is not exactly 0.001, and the default precision of big.Float is 64.
If you increase the precision of the number you set via a string value, you will see the same output:
s := "0.001"
f := new(big.Float)
f.SetPrec(100)
f.SetString(s)
fmt.Println(s)
fmt.Println(BigFloatToBigInt(f))
Now output will also be the same (try it on the Go Playground):
0.001
1000000000000000

Why does 0.1 + 0.2 get 0.3 in Google Go?

As long as floating point is used, 0.1 can not be represented exactly in memory, so we know that this value usually comes out to 0.10000000000000004.
But when using go to add 0.1 and 0.2.
I'm getting 0.3.
fmt.Println(0.1 + 0.2)
// Output : 0.3
Why is 0.3 coming out instead of 0.30000000000000004 ?
It is because when you print it (e.g. with the fmt package), the printing function already rounds to a certain amount of fraction digits.
See this example:
const ca, cb = 0.1, 0.2
fmt.Println(ca + cb)
fmt.Printf("%.20f\n", ca+cb)
var a, b float64 = 0.1, 0.2
fmt.Println(a + b)
fmt.Printf("%.20f\n", a+b)
Output (try it on the Go Playground):
0.3
0.29999999999999998890
0.30000000000000004
0.30000000000000004441
First we used constants because that's different than using (non-constant) values of type float64. Numeric constants represent exact values of arbitrary precision and do not overflow.
But when printing the result of ca+cb, the constant value have to be converted to a non-constant, typed value to be able to be passed to fmt.Println(). This value will be of type float64, which cannot represent 0.3 exactly. But fmt.Println() will round it to like ~16 fraction digits, which will be 0.3. But when we explicitly state we want it displayed with 20 digits, we'll see it's not exact. Note that only 0.3 will be converted to float64, because the constant arithmetic 0.1+0.2 will be evaluated by the compiler (at compile time).
Next we started with variables of type float64, and to no surprise, output wasn't 0.3 exactly, but this time even with the default rounding we got a result different from 0.3. The reason for this is because in the first case (constants) it was 0.3 that was converted, but this time both 0.1 and 0.2 were converted to float64, none of which is exact, and adding them resulted in a number having bigger distance from 0.3, big enough to make a "visual appearance" with the default rounding of the fmt package.
Check out similar / relevant questions+answers to know more about the topic:
Why do these two float64s have different values?
How does Go perform arithmetic on constants?
Golang converting float64 to int error
Does go compiler's evaluation differ for constant expression and other expression
Why does adding 0.1 multiple times remain lossless?
Golang Round to Nearest 0.05
Go: Converting float64 to int with multiplier

How does Go perform arithmetic on constants?

I've been reading this post on constants in Go, and I'm trying to understand how they are stored and used in memory. You can perform operations on very large constants in Go, and as long as the result fits in memory, you can coerce that result to a type. For example, this code prints 10, as you would expect:
const Huge = 1e1000
fmt.Println(Huge / 1e999)
How does this work under the hood? At some point, Go has to store 1e1000 and 1e999 in memory, in order to perform operations on them. So how are constants stored, and how does Go perform arithmetic on them?
Short summary (TL;DR) is at the end of the answer.
Untyped arbitrary-precision constants don't live at runtime, constants live only at compile time (during the compilation). That being said, Go does not have to represent constants with arbitrary precision at runtime, only when compiling your application.
Why? Because constants do not get compiled into the executable binaries. They don't have to be. Let's take your example:
const Huge = 1e1000
fmt.Println(Huge / 1e999)
There is a constant Huge in the source code (and will be in the package object), but it won't appear in your executable. Instead a function call to fmt.Println() will be recorded with a value passed to it, whose type will be float64. So in the executable only a float64 value being 10.0 will be recorded. There is no sign of any number being 1e1000 in the executable.
This float64 type is derived from the default type of the untyped constant Huge. 1e1000 is a floating-point literal. To verify it:
const Huge = 1e1000
x := Huge / 1e999
fmt.Printf("%T", x) // Prints float64
Back to the arbitrary precision:
Spec: Constants:
Numeric constants represent exact values of arbitrary precision and do not overflow.
So constants represent exact values of arbitrary precision. As we saw, there is no need to represent constants with arbitrary precision at runtime, but the compiler still has to do something at compile time. And it does!
Obviously "infinite" precision cannot be dealt with. But there is no need, as the source code itself is not "infinite" (size of the source is finite). Still, it's not practical to allow truly arbitrary precision. So the spec gives some freedom to compilers regarding to this:
Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
Represent integer constants with at least 256 bits.
Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.
Give an error if unable to represent an integer constant precisely.
Give an error if unable to represent a floating-point or complex constant due to overflow.
Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
These requirements apply both to literal constants and to the result of evaluating constant expressions.
However, also note that when all the above said, the standard package provides you the means to still represent and work with values (constants) with "arbitrary" precision, see package go/constant. You may look into its source to get an idea how it's implemented.
Implementation is in go/constant/value.go. Types representing such values:
// A Value represents the value of a Go constant.
type Value interface {
// Kind returns the value kind.
Kind() Kind
// String returns a short, human-readable form of the value.
// For numeric values, the result may be an approximation;
// for String values the result may be a shortened string.
// Use ExactString for a string representing a value exactly.
String() string
// ExactString returns an exact, printable form of the value.
ExactString() string
// Prevent external implementations.
implementsValue()
}
type (
unknownVal struct{}
boolVal bool
stringVal string
int64Val int64 // Int values representable as an int64
intVal struct{ val *big.Int } // Int values not representable as an int64
ratVal struct{ val *big.Rat } // Float values representable as a fraction
floatVal struct{ val *big.Float } // Float values not representable as a fraction
complexVal struct{ re, im Value }
)
As you can see, the math/big package is used to represent untyped arbitrary precision values. big.Int is for example (from math/big/int.go):
// An Int represents a signed multi-precision integer.
// The zero value for an Int represents the value 0.
type Int struct {
neg bool // sign
abs nat // absolute value of the integer
}
Where nat is (from math/big/nat.go):
// An unsigned integer x of the form
//
// x = x[n-1]*_B^(n-1) + x[n-2]*_B^(n-2) + ... + x[1]*_B + x[0]
//
// with 0 <= x[i] < _B and 0 <= i < n is stored in a slice of length n,
// with the digits x[i] as the slice elements.
//
// A number is normalized if the slice contains no leading 0 digits.
// During arithmetic operations, denormalized values may occur but are
// always normalized before returning the final result. The normalized
// representation of 0 is the empty or nil slice (length = 0).
//
type nat []Word
And finally Word is (from math/big/arith.go)
// A Word represents a single digit of a multi-precision unsigned integer.
type Word uintptr
Summary
At runtime: predefined types provide limited precision, but you can "mimic" arbitrary precision with certain packages, such as math/big and go/constant. At compile time: constants seemingly provide arbitrary precision, but in reality a compiler may not live up to this (doesn't have to); but still the spec provides minimal precision for constants that all compiler must support, e.g. integer constants must be represented with at least 256 bits which is 32 bytes (compared to int64 which is "only" 8 bytes).
When an executable binary is created, results of constant expressions (with arbitrary precision) have to be converted and represented with values of finite precision types – which may not be possible and thus may result in compile-time errors. Note that only results –not intermediate operands– have to be converted to finite precision, constant operations are carried out with arbitrary precision.
How this arbitrary or enhanced precision is implemented is not defined by the spec, math/big for example stores "digits" of the number in a slice (where digits is not a digit of the base 10 representation, but "digit" is an uintptr which is like base 4294967295 representation on 32-bit architectures, and even bigger on 64-bit architectures).
Go constants are not allocated to memory. They are used in context by the compiler. The blog post you refer to gives the example of Pi:
Pi = 3.14159265358979323846264338327950288419716939937510582097494459
If you assign Pi to a float32 it will lose precision to fit, but if you assign it to a float64, it will lose less precision, but the compiler will determine what type to use.

Resources