Divide 2 big.Floats in go and preserve the resulting decimal - go

I have 2 big.Float numbers in go.
x. 93214.310998100256907925
y. 1.954478300965909786
I want to find out what percent of x y is. This should be around 0.0020967578%. The issue comes when dividing these 2 big floats the answer is always 0.xxx but the method returns 2.09675776180520477879426e-05. Any ideas to fix this? I have tried converting to a string then back but that isn't another rabbit hole I won't include because I wasn't able to accomplish anything with it. I feel like there is a method to do this I am missing. I really only need 7 decimals of precision of that helps.

Why do you need big.Float? Even float32 seem to be fine:
package main
import "fmt"
func percent(y, x float32) float32 {
return y / x * 100
}
func main() {
p := percent(1.954478300965909786, 93214.310998100256907925)
fmt.Println(p) // 0.0020967575
}

Related

Float Arithmetic inconsistent between golang programs

When decoding audio files with pion/opus I will occasionally get values that are incorrect.
I have debugged it down to the following code. When this routine runs inside the Opus decoder I get a different value then when I run it outside? When the two floats are added together the right most bit is different. The difference in values eventually becomes a problem as the program runs longer.
Is this a bug or expected behavior? I don't know how to debug this deeper/dump state of my program to understand more.
Outside decoder
package main
import (
"fmt"
"math"
)
func main() {
a := math.Float32frombits(uint32(955684399))
b := math.Float32frombits(uint32(927295728))
fmt.Printf("%b\n", math.Float32bits(a))
fmt.Printf("%b\n", math.Float32bits(b))
fmt.Printf("%b\n", math.Float32bits(a+b))
}
Returns
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100110
Then Inside decoder
fmt.Printf("%b\n", math.Float32bits(lpcVal))
fmt.Printf("%b\n", math.Float32bits(val))
fmt.Printf("%b\n", math.Float32bits(lpcVal+val))
Returns
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100111
I guess that lpcval and val are not Float32 but rather Float64.
If that is the case, then you are proposing two different operations:
in the former case, you do Float32bits(lpcval) + Float32bits(val)
in the later case, you do Float32bits(lpcval + val)
the two 32 bits floats are in binary:
1.11101101001011000101111 * 2^-14
1.10001010110100011110000 * 2^-17
The exact sum is
1.000011110100001101001101 * 2^-13
which is an exact tie between two representable Float32
the result is rounded to the Float32 with even significand
1.00001111010000110100110 * 2^-13
But lpcval and val are Float64: instead of 23 bits after the floating point, they have 52 (19 more).
If a single bit among those 19 more bits is different from zero, the result might not be an exact tie, but slightly larger than the exact tie.
Once converted to nearest Float32, that will be
1.00001111010000110100111 * 2^-13
Since we have no idea of what lpcval and val contains in those low significant bits, anything can happen, even without the use of fma operations.
This was happening because of Fused multiply and add. Multiple floating point operations were becoming combined into one operation.
You can read more about it in the Go Language Spec#Floating_Point_Operators
The change I made to my code was
- lpcVal += currentLPCVal * (aQ12 / 4096.0)
+ lpcVal = float32(lpcVal) + float32(currentLPCVal)*float32(aQ12)/float32(4096.0)
Thank you to Bryan C. Mills for answering this on the #performance channel on the Gophers slack.

Is there a simple method for square root of big.Rat?

I need to find the square root of a big.Rat. Is there a way to do it without losing (already existing) accuracy?
For example, I could convert the numerator and denominator into floats, get the square root, and then convert it back...
func ratSquareRoot(num *big.Rat) *big.Rat {
f, exact := num.Float64() //Yuck! Floats!
squareRoot := math.Sqrt(f)
var accuracy int64 = 10 ^ 15 //Significant digits of precision for float64
return big.NewRat(int64(squareRoot*float64(accuracy)), accuracy)
// ^ This is now totally worthless. And also probably not simplified very well.
}
...but that would eliminate all of the accuracy of using a rational. Is there a better way of doing this?
The big.Float type has a .Sqrt(x) operation, and handles defining explicitly the precision you aim for. I'd try to use that and convert the result back to a Rat with the same operations in your question, only manipulating big.Int values.
r := big.NewRat(1, 3)
var x big.Float
x.SetPrec(30) // I didn't figure out the 'Prec' part correctly, read the docs more carefully than I did and experiement
x.SetRat(r)
var s big.Float
s.SetPrec(15)
s.Sqrt(&x)
r, _ = s.Rat(nil)
fmt.Println(x.String(), s.String())
fmt.Println(r.String(), float64(18919)/float64(32768))
playground

golang losing precision while converting float32 to float64?

In Golang, it seems that when a float64 var first convert to float32 then convert float64, it's value will change.
a := -8888.95
fmt.Println(a) // -8888.95
fmt.Println(float32(a)) // -8888.95
fmt.Println(float64(float32(a))) // -8888.9501953125
How can I make it unchanging
The way you have described the problem is perhaps misleading.
The precision is not lost "when converting float32 to float64"; rather, it is lost when converting from float64 to float32.
So how can you avoid losing precision when converting from float64 to float32? You can't. This task is impossible, and it's quite easy to see the reason why:
float64 has twice as many bits as float32
multiple different float64 values will map to the same float32 value due to the pigeonhole principle
the conversion is therefore not reversible.
package main
import (
"fmt"
)
func main() {
a := -8888.95
fmt.Printf("%.20f\n", a)
fmt.Printf("%.20f\n", float32(a))
fmt.Printf("%.20f\n", float64(float32(a)))
}
Adjusting your program to show a more precise output of each value, you'll see exactly where the precision is lost:
-8888.95000000000072759576
-8888.95019531250000000000
-8888.95019531250000000000
That is, after the float32 conversion (as is expected).
It's also worth noting that neither float64 nor float32 can represent your value -8888.95 exactly. If you convert this number to a fraction, you will get -177779/20. Notice the denominator, 20. The prime factorization of 20 is 2 * 2 * 5.
If you apply this process to a number and the prime factorization of the denominator contains any factors which are NOT 2, then you can rest assured that this number is definitely not representable exactly in binary floating point form. You may discover that the probability of any number passing this test is quite low.

GO: manipulating random generated float64

I was wondering if we can specify to the random generator to how many numbers should be generated after the point decimal?
Example of default behaviour:
fmt.Println(rand.float64())
Would print out the number 0.6046602879796196
Desired behaviour:
fmt.Println(rand.float64(4))
Would then print out the number 0.6047.
Does this functionality already exist in GO or would I have to implement it myself ?
Thank you!
It sounds like only the string representation is important to you, and the fmt package does provide that for you:
fmt.Printf("%1.4f", rand.Float64())
So yes, you would still need to wrap this call to specify the number of digits after the decimal point.
func RandomDigits(number int) string {
return fmt.Sprintf("%1." + strconv.Itoa(number) + "f", rand.Float64())
}
I don't know of such function, however it is easy to implement by yourself (play):
// Truncate the number x to n decimal places
//
// +- Inf -> +- Inf; NaN -> NaN
func truncate(x float64, n int) float64 {
return math.Trunc(x * math.Pow(10, float64(n))) * math.Pow(10, -float64(n))
}
Shift the number n decimal places to the left, truncate decimal places, shift the number n places to the right.
In case you want to present your number to the user then you will, at one point, convert the number
to a string. When you do that, you should not use this method and instead use string formatting as pointed
out by Tyson. For example, as floating point numbers are imprecise there might be rounding errors:
truncate(0.9405090880450124,3) // 0.9400000000000001

GoLang for loop with floats creates error

Can someone explain the following. I have a function in go that accepts a couple of float64 and then uses this value to calculate a lot of other values. The function looks like
func (g *Geometry) CalcStresses(x, zmax, zmin float64)(Vertical)
the result is put into a struct like
type Vertical struct {
X float64
Stresses []Stress
}
Now the funny thing is this. If I call the function like this;
for i:=14.0; i<15.0; i+=0.1{
result := geo.CalcStresses(i, 10, -10)
}
then I get a lot of results where the Stress array is empty, antoher interesting detail is that x sometimes shows like a number with a LOT of decimals (like 14.3999999999999999998)
However, if I call the function like this;
for i:=0; i<10; i++{
x := 14.0 + float64(i) * 0.1
result := geo.CalcStresses(x,10,-10)
}
then everything is fine.
Does anyone know why this happens?
Thanks in advance,
Rob
Not all real numbers can be represented precisely in binary floating point format, therefore looping over floating point number is asking for trouble.
From Wikipedia on Floating point
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
For example, the non-representability of 0.1 and 0.01 (in binary) means that the result of attempting to square 0.1 is neither 0.01 nor the representable number closest to it.
This code
for i := 14.0; i < 15.0; i += 0.1 {
fmt.Println(i)
}
produces this
14
14.1
14.2
14.299999999999999
14.399999999999999
14.499999999999998
14.599999999999998
14.699999999999998
14.799999999999997
14.899999999999997
14.999999999999996
You may use math.big.Rat type to represent rational numbers accurately.
Example
x := big.NewRat(14, 1)
y := big.NewRat(15, 1)
z := big.NewRat(1, 10)
for i := x; i.Cmp(y) < 0; i = i.Add(i, z) {
v, _ := i.Float64()
fmt.Println(v)
}

Resources