If you parse a string into a big.Float like f.SetString("0.001"), then multiply it, I'm seeing a loss of precision. If I use f.SetFloat64(0.001), I don't lose precision. Even doing a strconv.ParseFloat("0.001", 64), then calling f.SetFloat() works.
Full example of what I'm seeing here:
https://play.golang.org/p/_AyTHJJBUeL
Expanded from this question: https://stackoverflow.com/a/47546136/105562
The difference in output is due to imprecise representation of base 10 floating point numbers in float64 (IEEE-754 format) and the default precision and rounding of big.Float.
See this simple code to verify:
fmt.Printf("%.30f\n", 0.001)
f, ok := new(big.Float).SetString("0.001")
fmt.Println(f.Prec(), ok)
Output of the above (try it on the Go Playground):
0.001000000000000000020816681712
64 true
So what we see is that the float64 value 0.001 is not exactly 0.001, and the default precision of big.Float is 64.
If you increase the precision of the number you set via a string value, you will see the same output:
s := "0.001"
f := new(big.Float)
f.SetPrec(100)
f.SetString(s)
fmt.Println(s)
fmt.Println(BigFloatToBigInt(f))
Now output will also be the same (try it on the Go Playground):
0.001
1000000000000000
Related
In Golang, it seems that when a float64 var first convert to float32 then convert float64, it's value will change.
a := -8888.95
fmt.Println(a) // -8888.95
fmt.Println(float32(a)) // -8888.95
fmt.Println(float64(float32(a))) // -8888.9501953125
How can I make it unchanging
The way you have described the problem is perhaps misleading.
The precision is not lost "when converting float32 to float64"; rather, it is lost when converting from float64 to float32.
So how can you avoid losing precision when converting from float64 to float32? You can't. This task is impossible, and it's quite easy to see the reason why:
float64 has twice as many bits as float32
multiple different float64 values will map to the same float32 value due to the pigeonhole principle
the conversion is therefore not reversible.
package main
import (
"fmt"
)
func main() {
a := -8888.95
fmt.Printf("%.20f\n", a)
fmt.Printf("%.20f\n", float32(a))
fmt.Printf("%.20f\n", float64(float32(a)))
}
Adjusting your program to show a more precise output of each value, you'll see exactly where the precision is lost:
-8888.95000000000072759576
-8888.95019531250000000000
-8888.95019531250000000000
That is, after the float32 conversion (as is expected).
It's also worth noting that neither float64 nor float32 can represent your value -8888.95 exactly. If you convert this number to a fraction, you will get -177779/20. Notice the denominator, 20. The prime factorization of 20 is 2 * 2 * 5.
If you apply this process to a number and the prime factorization of the denominator contains any factors which are NOT 2, then you can rest assured that this number is definitely not representable exactly in binary floating point form. You may discover that the probability of any number passing this test is quite low.
If I run the following piece of Go code:
fmt.Println(float32(0.1) + float32(0.2))
fmt.Println(float64(0.1) + float64(0.2))
the output is:
0.3
0.30000000000000004
It appears the result of the float32 sum is more exact than the result of the float64 sum, why? I thought that float64 is always more precise than float32. How do I decide which one to pick to have the most accurate result?
It isn't. fmt.Println is just making it look more precise. Println uses %g for floating point and complex numbers. The docs say...
The default precision for... %g it is the smallest number of digits necessary to identify the value uniquely.
0.3 is sufficient to identify a float32. But float64 being much more precise needs more digits.
We can use fmt.Printf and %0.20g to force both numbers to display the same precision.
f32 := float32(0.1) + float32(0.2)
f64 := float64(0.1) + float64(0.2)
fmt.Printf("%0.20g\n", f32)
fmt.Printf("%0.20g\n", f64)
0.30000001192092895508
0.30000000000000004441
float64 is more precise. Neither are exact as that is the nature of floating point numbers.
We can use strconv.FormatFloat to see what these numbers really are.
fmt.Println(strconv.FormatFloat(float64(f32), 'b', -1, 32))
fmt.Println(strconv.FormatFloat(f64, 'b', -1, 64))
10066330p-25
5404319552844596p-54
That is 10066330 * 2^-25 and 5404319552844596 * 2^-54.
func main() {
target := 20190201518310870.0
fmt.Println(int64(target))
z3 := big.NewInt(int64(target))
fmt.Println(z3)
}
The result is 20190201518310872
How do I convert it and not make overflow?
The problem is that even your input target number is not equal to the constant you assign to it.
The float64 type uses the double-precision floating-point format (IEEE 754) to store the number, which has finite bits to utilize (64 bits in total, but only 53 bits are used to store the significand). This means it can roughly store ~16 digits, but your input number has 17, so it will be rounded to the nearest representable float64.
If you print target, you will see the exact number that is "transfered" to big.Int:
target := 20190201518310870.0
fmt.Printf("%f\n", target)
Outputs (try it on the Go Playground):
20190201518310872.000000
Note that it works if the input constant "fits" into float64:
target := 20190201518310.0
fmt.Printf("%f\n", target)
fmt.Println(int64(target))
z3 := big.NewInt(int64(target))
fmt.Println(z3)
Outputs (try it on the Go Playground):
20190201518310.000000
20190201518310
20190201518310
If you need to work with big numbers exactly such as 20190201518310870.0, you have to use another type to store it in the first place, e.g. string, big.Int or big.Float, but not float64.
For example:
target := "20190201518310870"
fmt.Println(target)
z3, ok := big.NewInt(0).SetString(target, 10)
fmt.Println(z3, ok)
Output (try it on the Go Playground):
20190201518310870
20190201518310870 true
Rounding positive value (example here: 1.015) half-up to 2 decimal places using math.Round() in Go:
fmt.Println(math.Round(1.015*100) / 100)
Go Playground
I got: 1.02. That's correct.
But when I employed a function to do the same job:
func RoundHalfUp(x float64) float64 {
return math.Round(x*100) / 100
}
Go Playground
I got 1.01.
What's wrong with the RoundHalfUp function?
The Go Programming Language
Specification
Constants
Numeric constants represent exact values of arbitrary precision and do
not overflow.
Implementation restriction: Although numeric constants have arbitrary
precision in the language, a compiler may implement them using an
internal representation with limited precision. That said, every
implementation must:
Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed
binary exponent of at least 16 bits.
Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
These requirements apply both to literal constants and to the result
of evaluating constant expressions.
Constant expressions
Constant expressions may contain only constant operands and are
evaluated at compile time.
Constant expressions are always evaluated exactly; intermediate values
and the constants themselves may require precision significantly
larger than supported by any predeclared type in the language.
Implementation restriction: A compiler may use rounding while
computing untyped floating-point or complex constant expressions; see
the implementation restriction in the section on constants. This
rounding may cause a floating-point constant expression to be invalid
in an integer context, even if it would be integral when calculated
using infinite precision, and vice versa.
Implement the RoundHalfUp function like the Go compiler does for math.Round(1.015*100) / 100. 1.015*100 is a untyped floating-point constant expression. Use the math/big package with at least 256 bits of precision. Go float64 (IEEE-754 64-bit floating-point) has 53 bits of precision.
For example, with 256 bits of precision (constant expression),
package main
import (
"fmt"
"math"
"math/big"
)
func RoundHalfUp(x string) float64 {
// math.Round(x*100) / 100
xf, _, err := big.ParseFloat(x, 10, 256, big.ToNearestEven)
if err != nil {
panic(err)
}
xf100, _ := new(big.Float).Mul(xf, big.NewFloat(100)).Float64()
return math.Round(xf100) / float64(100)
}
func main() {
fmt.Println(RoundHalfUp("1.015"))
}
Playground: https://play.golang.org/p/uqtYwP4o22B
Output:
1.02
If we only use 53 bits of precision (float64):
xf, _, err := big.ParseFloat(x, 10, 53, big.ToNearestEven)
Playground: https://play.golang.org/p/ejz-wkuycaU
Output:
1.01
As long as floating point is used, 0.1 can not be represented exactly in memory, so we know that this value usually comes out to 0.10000000000000004.
But when using go to add 0.1 and 0.2.
I'm getting 0.3.
fmt.Println(0.1 + 0.2)
// Output : 0.3
Why is 0.3 coming out instead of 0.30000000000000004 ?
It is because when you print it (e.g. with the fmt package), the printing function already rounds to a certain amount of fraction digits.
See this example:
const ca, cb = 0.1, 0.2
fmt.Println(ca + cb)
fmt.Printf("%.20f\n", ca+cb)
var a, b float64 = 0.1, 0.2
fmt.Println(a + b)
fmt.Printf("%.20f\n", a+b)
Output (try it on the Go Playground):
0.3
0.29999999999999998890
0.30000000000000004
0.30000000000000004441
First we used constants because that's different than using (non-constant) values of type float64. Numeric constants represent exact values of arbitrary precision and do not overflow.
But when printing the result of ca+cb, the constant value have to be converted to a non-constant, typed value to be able to be passed to fmt.Println(). This value will be of type float64, which cannot represent 0.3 exactly. But fmt.Println() will round it to like ~16 fraction digits, which will be 0.3. But when we explicitly state we want it displayed with 20 digits, we'll see it's not exact. Note that only 0.3 will be converted to float64, because the constant arithmetic 0.1+0.2 will be evaluated by the compiler (at compile time).
Next we started with variables of type float64, and to no surprise, output wasn't 0.3 exactly, but this time even with the default rounding we got a result different from 0.3. The reason for this is because in the first case (constants) it was 0.3 that was converted, but this time both 0.1 and 0.2 were converted to float64, none of which is exact, and adding them resulted in a number having bigger distance from 0.3, big enough to make a "visual appearance" with the default rounding of the fmt package.
Check out similar / relevant questions+answers to know more about the topic:
Why do these two float64s have different values?
How does Go perform arithmetic on constants?
Golang converting float64 to int error
Does go compiler's evaluation differ for constant expression and other expression
Why does adding 0.1 multiple times remain lossless?
Golang Round to Nearest 0.05
Go: Converting float64 to int with multiplier