How can I truncate float64 number to a particular precision? - go

I want to truncate 1.234567 into a 3-fraction digit floating point number, but the result is not what I want.
E.g: 1.234567 => 1.234
package main
import (
"strconv"
"fmt"
)
func main() {
f := 1.234567
fmt.Println(strconv.FormatFloat(f, 'f', 3, 64)) //1.235
fmt.Printf("%.3f", f) //1.235
}
Can anyone tell me how to do this in Go?

The naive way (not always correct)
For truncation, we could take advantage of math.Trunc() which throws away the fraction digits. This is not exactly what we want, we want to keep some fraction digits. So in order to achieve what we want, we may first multiply the input by a power of 10 to shift the wanted fraction digits to the "integer" part, and after truncation (calling math.Trunc() which will throw away the remaining fraction digits), we can divide by the same power of 10 we multiplied in the beginning:
f2 := math.Trunc(f*1000) / 1000
Wrapping this into a function:
func truncateNaive(f float64, unit float64) float64 {
return math.Trunc(f/unit) * unit
}
Testing it:
f := 1.234567
f2 := truncateNaive(f, 0.001)
fmt.Printf("%.10f\n", f2)
Output:
1.2340000000
So far so good, but note that we perform arithmetic operations inside truncateNaive() which may result in unwanted roundings, which could alter the output of the function.
For example, if the input is 0.299999999999999988897769753748434595763683319091796875 (it's representable by a float64 value exactly, see proof), the output should be 0.2999000000, but it will be something else:
f = 0.299999999999999988897769753748434595763683319091796875
f2 = truncateNaive(f, 0.001)
fmt.Printf("%.10f\n", f2)
Output:
0.3000000000
Try these on the Go Playground.
This wrong output is probably not acceptable in most cases (except if you look at it from a way that the input is very close to 0.3–difference is less than 10-16–to which the output is 0.3...).
Using big.Float
To properly truncate all valid float64 values, the intermediate operations must be precise. To achieve that, using a single float64 is insufficient. There are ways to split the input into 2 float64 values and perform operations on them (so precision is not lost) which would be more efficient, or we could use a more convenient way, big.Float which can be of arbitrary precision.
Here's the "transcript" of the above truncateNaive() function using big.Float:
func truncate(f float64, unit float64) float64 {
bf := big.NewFloat(0).SetPrec(1000).SetFloat64(f)
bu := big.NewFloat(0).SetPrec(1000).SetFloat64(unit)
bf.Quo(bf, bu)
// Truncate:
i := big.NewInt(0)
bf.Int(i)
bf.SetInt(i)
f, _ = bf.Mul(bf, bu).Float64()
return f
}
Testing it:
f := 1.234567
f2 := truncate(f, 0.001)
fmt.Printf("%.10f\n", f2)
f = 0.299999999999999988897769753748434595763683319091796875
f2 = truncate(f, 0.001)
fmt.Printf("%.10f\n", f2)
Output is now valid (try it on the Go Playground):
1.2340000000
0.2990000000

You need to truncate decimals manually, either on string level or with math.Floor like https://play.golang.org/p/UP2gFx2iFru.

Related

Go/Golang: how to extract least significant digits from big.Float?

In Go/Golang I have a variable of type big.Float with an (arbitrary) precision of 3,324,000 to represent a decimal number of 1,000,000 digits. It's the result of an iteration to calculate pi.
Now I want to print out the least significant 100 digits, i.e. digits 999,900 to 1,000,000.
I tried to convert the variable to a string by using fmt.Sprintf() and big.Text(). However, both functions consume a lot of processing time which gets unacceptable (many hours and even days) when further raising the precision.
I'm searching for some functions which extract the last 100 (decimal) digits of the variable.
Thanks in advance for your kind support.
The standard library doesn't provide a function to return those digits efficiently, but you can calculate them.
It is more efficient to isolate the digits you are interested in and print them. This avoids excessive calculations of an extremely large number to determine each individual digit.
The code below shows a way it can be done. You will need to ensure you have enough precision to generate them accurately.
package main
import (
"fmt"
"math"
"math/big"
)
func main() {
// Replace with larger calculation.
pi := big.NewFloat(math.Pi)
const (
// Pi: 3.1415926535897932...
// Output: 5926535897
digitOffset = 3
digitLength = 10
)
// Move the desired digits to the right side of the decimal point.
mult := pow(10, digitOffset)
digits := new(big.Float).Mul(pi, mult)
// Remove the integer component.
digits.Sub(digits, trunc(digits))
// Move the digits to the left of the decimal point, and truncate
// to an integer representing the desired digits.
// This avoids undesirable rounding if you simply print the N
// digits after the decimal point.
mult = pow(10, digitLength)
digits.Mul(digits, mult)
digits = trunc(digits)
// Display the next 'digitLength' digits. Zero padded.
fmt.Printf("%0*.0f\n", digitLength, digits)
}
// trunc returns the integer component.
func trunc(n *big.Float) *big.Float {
intPart, accuracy := n.Int(nil)
_ = accuracy
return new(big.Float).SetInt(intPart)
}
// pow calculates n^idx.
func pow(n, idx int64) *big.Float {
if idx < 0 {
panic("invalid negative exponent")
}
result := new(big.Int).Exp(big.NewInt(n), big.NewInt(idx), nil)
return new(big.Float).SetInt(result)
}

How to store a big float64 in a string without overflow?

func main() {
target := 20190201518310870.0
fmt.Println(int64(target))
z3 := big.NewInt(int64(target))
fmt.Println(z3)
}
The result is 20190201518310872
How do I convert it and not make overflow?
The problem is that even your input target number is not equal to the constant you assign to it.
The float64 type uses the double-precision floating-point format (IEEE 754) to store the number, which has finite bits to utilize (64 bits in total, but only 53 bits are used to store the significand). This means it can roughly store ~16 digits, but your input number has 17, so it will be rounded to the nearest representable float64.
If you print target, you will see the exact number that is "transfered" to big.Int:
target := 20190201518310870.0
fmt.Printf("%f\n", target)
Outputs (try it on the Go Playground):
20190201518310872.000000
Note that it works if the input constant "fits" into float64:
target := 20190201518310.0
fmt.Printf("%f\n", target)
fmt.Println(int64(target))
z3 := big.NewInt(int64(target))
fmt.Println(z3)
Outputs (try it on the Go Playground):
20190201518310.000000
20190201518310
20190201518310
If you need to work with big numbers exactly such as 20190201518310870.0, you have to use another type to store it in the first place, e.g. string, big.Int or big.Float, but not float64.
For example:
target := "20190201518310870"
fmt.Println(target)
z3, ok := big.NewInt(0).SetString(target, 10)
fmt.Println(z3, ok)
Output (try it on the Go Playground):
20190201518310870
20190201518310870 true

Comparing floats by ignoring last bit in golang

A specification reads as follows:
It still considers real numbers equal if they differ in their last
binary digit.
I would like to implement this way of comparing floats for the float64 data type in Go. Unfortunately, the bitwise operators aren't defined for floating point numbers. Is there a way to achieve this way of comparing floats in the Go language?
This looks like a perfect use case for the following function from the math package:
func equal(x, y float64) bool {
return math.Nextafter(x, y) == y
}
Nextafter returns the next representable float64 value after x towards y.
Special cases are:
Nextafter(x, x) = x
Nextafter(NaN, y) = NaN
Nextafter(x, NaN) = NaN
https://play.golang.org/p/unRkkoe6wb
If you want to know if two float64 values are adjacent (that is, there's no float64 value between them):
func almostEqual(a, b float64) bool {
ai, bi := int64(math.Float64bits(a)), int64(math.Float64bits(b))
return a == b || -1 <= ai-bi && ai-bi <= 1
}
Mostly that's the same as saying they differ in the lowest bit of their mantissa.
This code doesn't work if a or b are NaNs, zeros or infinities, but you could add special cases if you wished.
See https://randomascii.wordpress.com/2012/01/23/stupid-float-tricks-2/

Is there any standard library to convert float64 to string with fix width with maximum number of significant digits?

Imagine for printing in a 12 fixed width table we need printing float64 numbers:
fmt.Printf("%12.6g\n", 9.405090880450127e+119) //"9.40509e+119"
fmt.Printf("%12.6g\n", 0.1234567890123) //" 0.123457"
fmt.Printf("%12.6g\n", 123456789012.0) //" 1.23457e+11"
We prefer 0.1234567890 to " 0.123457" we lose 6 significant digits.
We prefer 123456789012 to " 1.23457e+11" we lose 6 significant digits.
Is there any standard library to convert float64 to string with fix width with maximum number of significant digits?
Thanks in Advance.
Basically you have 2 output formats: either a scientific notation or a regular form. The turning point between those 2 formats is 1e12.
So you can branch if x >= 1e12. In both branches you may do a formatting with 0 fraction digits to see how long the number will be, so you can calculate how many fraction digits will fit in for 12 width, and so you can construct the final format string, using the calculated precision.
The pre-check is required in the scientific notation too (%g), because the width of exponent may vary (e.g. e+1, e+10, e+100).
Here is an example implementation. This is to get you started, but it does not mean to handle all cases, and it is not the most efficient solution (but relatively simple and does the job):
// format12 formats x to be 12 chars long.
func format12(x float64) string {
if x >= 1e12 {
// Check to see how many fraction digits fit in:
s := fmt.Sprintf("%.g", x)
format := fmt.Sprintf("%%12.%dg", 12-len(s))
return fmt.Sprintf(format, x)
}
// Check to see how many fraction digits fit in:
s := fmt.Sprintf("%.0f", x)
if len(s) == 12 {
return s
}
format := fmt.Sprintf("%%%d.%df", len(s), 12-len(s)-1)
return fmt.Sprintf(format, x)
}
Testing it:
fs := []float64{0, 1234.567890123, 0.1234567890123, 123456789012.0, 1234567890123.0,
9.405090880450127e+9, 9.405090880450127e+19, 9.405090880450127e+119}
for _, f := range fs {
fmt.Println(format12(f))
}
Output (try it on the Go Playground):
0.0000000000
0.1234567890
1234.5678901
123456789012
1.234568e+12
9405090880.5
9.405091e+19
9.40509e+119

GoLang for loop with floats creates error

Can someone explain the following. I have a function in go that accepts a couple of float64 and then uses this value to calculate a lot of other values. The function looks like
func (g *Geometry) CalcStresses(x, zmax, zmin float64)(Vertical)
the result is put into a struct like
type Vertical struct {
X float64
Stresses []Stress
}
Now the funny thing is this. If I call the function like this;
for i:=14.0; i<15.0; i+=0.1{
result := geo.CalcStresses(i, 10, -10)
}
then I get a lot of results where the Stress array is empty, antoher interesting detail is that x sometimes shows like a number with a LOT of decimals (like 14.3999999999999999998)
However, if I call the function like this;
for i:=0; i<10; i++{
x := 14.0 + float64(i) * 0.1
result := geo.CalcStresses(x,10,-10)
}
then everything is fine.
Does anyone know why this happens?
Thanks in advance,
Rob
Not all real numbers can be represented precisely in binary floating point format, therefore looping over floating point number is asking for trouble.
From Wikipedia on Floating point
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
For example, the non-representability of 0.1 and 0.01 (in binary) means that the result of attempting to square 0.1 is neither 0.01 nor the representable number closest to it.
This code
for i := 14.0; i < 15.0; i += 0.1 {
fmt.Println(i)
}
produces this
14
14.1
14.2
14.299999999999999
14.399999999999999
14.499999999999998
14.599999999999998
14.699999999999998
14.799999999999997
14.899999999999997
14.999999999999996
You may use math.big.Rat type to represent rational numbers accurately.
Example
x := big.NewRat(14, 1)
y := big.NewRat(15, 1)
z := big.NewRat(1, 10)
for i := x; i.Cmp(y) < 0; i = i.Add(i, z) {
v, _ := i.Float64()
fmt.Println(v)
}

Resources