Go: How to check precision loss when converting float64 to float32

Go: How to check precision loss when converting float64 to float32 - go

I have a scenario where I receive a float64 value, but must send it down the wire to another service as a float32 value. We know the received value should always fit into a float32. However, to be safe I want to log the case where we are losing data by converting to float32.
This code block does not compile, since you can't compare float32 to float64 directly.
func convert(input float64) (output float32, err error) {
const tolerance = 0.001
output = float32(input)
if output > input+tolerance || output < input-tolerance {
return 0, errors.New("lost too much precision")
}
return output, nil
}
Is there an easy way to check that I am hitting this condition? This check will happen at high frequency, so I want to avoid doing string conversions.

You can convert back the float32 value to float64, just for the validation.
To check if the converted value represents the same value, simply compare it to the original value (the input). It's also enough / idiomatic to just return an ok bool info (instead of an error):
func convert(input float64) (output float32, ok bool) {
output = float32(input)
ok = float64(output) == input
return
}
(Note: edge cases like NaN are not checked.)
Testing it:
fmt.Println(convert(1))
fmt.Println(convert(1.5))
fmt.Println(convert(0.123456789))
fmt.Println(convert(math.MaxFloat32))
Output (try it on the Go Playground):
1 true
1.5 true
0.12345679 false
3.4028235e+38 true
Note that this will often give ok = false result because the precision of float32 is less than that of float64, even though the converted value may be very close to the input.
So in practice it would be more useful to check the difference of the converted value. Your proposed solution checks for the absolute difference value which is not so useful: for example 1000000.1 and 1000000 are very close numbers, even though the difference is 0.1. 0.0001 and 0.00011 have much less difference: 0.00001, yet the difference compared to the numbers is much bigger.
So you should check the relative difference, for example:
func convert(input float64) (output float32, ok bool) {
const maxRelDiff = 1e-8
output = float32(input)
diff := math.Abs(float64(output) - input)
ok = diff <= math.Abs(input)*maxRelDiff
return
}
Testing it:
fmt.Println(convert(1))
fmt.Println(convert(1.5))
fmt.Println(convert(1e20))
fmt.Println(convert(math.Pi))
fmt.Println(convert(0.123456789))
fmt.Println(convert(math.MaxFloat32))
Output (try it on the Go Playground):
1 true
1.5 true
1e+20 false
3.1415927 false
0.12345679 false
3.4028235e+38 true

Yes. Check that the value does not exceed the upper or lower value limit. Then ensure the 52 - 23 least significant bits are 0. (in a nutshell)

Related

How do I parse a currency value as *big.Int in Go?

I want to parse a string like "12.49" into a *big.Int in Go. The resulting *big.Int should represent the amount of cents in the given value, in this case 1249. Here are some more examples of inputs and their expected outputs:
"3": 300
"3.1": 310
".19": 19
I already tried working with *big.Float and its Int function, but realized, that *big.Float does not provide arbitrary precision.
Right now I'm using this algorithm, but it seems fragile (Go Playground link):
func eurToCents(in string) *big.Int {
missingZerosUntilCents := 2
i := strings.Index(in, ".")
if i > -1 {
missingZerosUntilCents -= len(in) - i - 1
if missingZerosUntilCents < 0 {
panic("too many decimal places")
}
}
in = strings.Replace(in, ".", "", 1)
in += strings.Repeat("0", missingZerosUntilCents)
out, ok := big.NewInt(0).SetString(in, 10)
if !ok {
panic(fmt.Sprintf("could not parse '%s' as an interger", in))
}
return out
}
Is there a standard library function or other common way to parse currencies in Go? An external library is not an option.
PS: I'm parsing Nano cryptocurrency values, which have 30 decimal places and a maximum value of 133,248,297.0. That's why I'm asking for *big.Int and not uint64.

Update: Seems like this solution is still buggy, because an inaccurate result is reported after multiplication: https://play.golang.org/p/RS-DC6SeRwz
After revisiting the solution with *big.Float, I realized, that it does work perfectly fine. I think I forgot to use SetPrec on rawPerNano previously. I'm going to provide an example for the Nano cryptocurrency, because it requires many decimal places.
This works as expected (Go Playground link):
func nanoToRaw(in string) *big.Int {
f, _ := big.NewFloat(0).SetPrec(128).SetString(in)
rawPerNano, _ := big.NewFloat(0).SetPrec(128).SetString("1000000000000000000000000000000")
f.Mul(f, rawPerNano)
i, _ := f.Int(big.NewInt(0))
return i
}
Thanks #hymns-for-disco for nudging me in the right direction!

Can not use specific float32 values as map key

Using float32 as map key returns unexpected result
package main
import "fmt"
func main() {
result := make(map[float32]map[float32]float32)
var t1 float32 = 1586238540
var t2 float32 = 1586238600
result[t1] = map[float32]float32{1:1,2:2}
result[t2] = map[float32]float32{3:3,4:4}
fmt.Println(result[t1])
fmt.Println(result[t2])
}
map[3:3 4:4]
map[3:3 4:4]
Go version: go version go1.14 linux/amd64
Changing result to map[float64]map[float32]float32 and t1, t2 accordingly gives the right result.
What could be a reason for this weird behavior?

A 32 bit float has a 23 bit mantissa, with an implicit preceeding highest 1 bit. So the maximum value representable by the mantissa is 2²⁴-1 = 16777215. In other words only integer numbers between -16777215 to 16777215 can be exactly represented as a 32 bit float.
Your two values 1586238540 and 1586238600 are outside that range and both get truncated to the same value 1586238592. And it's that truncated value that's being used as key for the map.

https://play.golang.org/p/Fx78BbmnXIE, 1586238540 and 1586238600 are same in memory

if you add this to your code
fmt.Println(t1)
fmt.Println(t2)
you'll see 1.5862386e+09 as result for both because the value is too big for a float32. with float64 you'll see the proper value printed
1.58623854e+09
1.5862386e+09
for more info wikipedia

Why is a float64 type number throwing int related error in Go

I am trying to grasp Golang, in one of the tutorial example it says that An untyped constant takes the type needed by its context.
package main
import "fmt"
const (
// Create a huge number by shifting a 1 bit left 100 places.
// In other words, the binary number that is 1 followed by 100 zeroes.
Big = 1 << 100
// Shift it right again 99 places, so we end up with 1<<1, or 2.
Small = Big >> 99
)
func needInt(x int) int { return x*10 + 1 }
func needFloat(x float64) float64 {
return x * 0.1
}
func main() {
fmt.Println(needInt(Small))
fmt.Println(needFloat(Small))
// Here Big is too large of a number but can be handled as a float64.
// No compilation error is thrown here.
fmt.Println(needFloat(Big))
// The below line throws the following compilation error
// constant 1267650600228229401496703205376 overflows int
fmt.Println(Big)
}
When calling fmt.Println(Big) why is Golang treating Big as an int where as by context it should be float64?
What am I missing?

What is the context for fmt.Println? In other words, what does fmt.Println expect Big to be? An interface{}.
From the Go Blog on Constants:
What happens when fmt.Printf is called with an untyped constant is that an interface value is created to pass as an argument, and the concrete type stored for that argument is the default type of the constant.
So the default type of the constant must be an int. The page goes on to talk about how the defaults get determined based on syntax, not necessarily the value of the const.

Big in fmt.Println(Big) has type integer which is more than max int value 9223372036854775807
you can find max int from this logic
const MaxUint = ^uint(0)
const MaxInt = int(MaxUint >> 1)
fmt.Println(MaxInt) // print 922337
2036854775807
To fix it, you need to cast it to float64 like this
fmt.Println(float64(Big))

Go print large number

I am currently doing the Go Lang tutorial, "Numeric Constants" to be precise. The example code starts with the following statement:
const (
// Create a huge number by shifting a 1 bit left 100 places.
// In other words, the binary number that is 1 followed by 100 zeroes.
Big = 1 << 100
// Shift it right again 99 places, so we end up with 1<<1, or 2.
Small = Big >> 99
)
The constant Big is obviously huge, and I am trying to print it and its type, like this:
fmt.Printf("%T", Big)
fmt.Println(Big)
However, I get the following error for both lines:
# command-line-arguments ./compile26.go:19: constant 1267650600228229401496703205376 overflows int
I would try casting Big to some other type, such as uint64, which it overflowed with the same error, or just convert it to a string, but when trying Big.String() I get the following error:
Big.String undefined (type int has no field or method String)
It appears that its type is int, yet I can't print it or cast it to anything and it overflows all methods. What do I do with this number/object and how do I print it?

That value is larger than any 64 bit numeric type can hold, so you have no way of manipulating it directly.
If you need to write a numeric constant that can only be manipulated with the math/big package, you need to store it serialized in a format that package can consume. Easiest way is probably to use a base 10 string:
https://play.golang.org/p/Mzwox3I2SL
bigNum := "1267650600228229401496703205376"
b, ok := big.NewInt(0).SetString(bigNum, 10)
fmt.Println(ok, b)
// true 1267650600228229401496703205376

Why is there no type mismatch error?

I define the number to be input by user as var input float64 and I input an integer and I would expect to get an error but I get err = <nil>. What am I missing?
package main
import (
"fmt"
)
func main() {
var input float64
fmt.Print("Enter a number:")
n, err := fmt.Scanf("%f\n", &input)
fmt.Printf("err = %v\n", err)
if err != nil {
fmt.Printf("%v is not a float - exiting with error\n", input, err)
return
}
fmt.Printf("n is %v:", n)
}
This is the output:
C:\Go\src\play\exercise>go run exercise2.go
Enter a number to take its square root: 1
err = <nil>
n is 1:

The Go Programming Language Specification
Conversions
Specific rules apply to (non-constant) conversions between numeric
types.
Conversions between numeric types
For the conversion of non-constant numeric values, the following rules
apply:
When converting between integer types, if the value is a signed
integer, it is sign extended to implicit infinite precision;
otherwise it is zero extended. It is then truncated to fit in the
result type's size. For example, if v := uint16(0x10F0), then
uint32(int8(v)) == 0xFFFFFFF0. The conversion always yields a valid
value; there is no indication of overflow.
When converting a floating-point number to an integer, the fraction
is discarded (truncation towards zero).
When converting an integer or floating-point number to a
floating-point type, or a complex number to another complex type,
the result value is rounded to the precision specified by the
destination type. For instance, the value of a variable x of type
float32 may be stored using additional precision beyond that of an
IEEE-754 32-bit number, but float32(x) represents the result of
rounding x's value to 32-bit precision. Similarly, x + 0.1 may use
more than 32 bits of precision, but float32(x + 0.1) does not.
In all non-constant conversions involving floating-point or complex
values, if the result type cannot represent the value the conversion
succeeds but the result value is implementation-dependent.
In Go, you can convert from an integer type to a floating-point type. Therefore, there is no reason not to be forgiving.
Robustness principle
In computing, the robustness principle is a general design guideline
for software:
Be conservative in what you do, be liberal in what you accept from others
(often reworded as "Be conservative in what you send, be
liberal in what you accept").
The principle is also known as Postel's law, after Internet pioneer
Jon Postel, who wrote in an early specification of the Transmission
Control Protocol that:1
TCP implementations should follow a general principle of robustness: be conservative in > what you do, be liberal in what you
accept from others.
In other words, code that sends commands or data to other machines (or
to other programs on the same machine) should conform completely to
the specifications, but code that receives input should accept
non-conformant input as long as the meaning is clear.

Scanf scans text read from standard input, storing successive space-separated values into successive arguments as determined by the format. It returns the number of items successfully scanned.
Scanf returns the number of things it read. In this case n == 1 because... you entered one token followed by a newline. Presumably, you want the value of input, not n.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio