Is there a safe way to cast float64 to int64 in Go? - go

I am aware of the int() or int64() typecast functions, the Go provides.
Is there a safe way to cast float64 to int64 (with truncation,) given the largest representable 64-bit floating point number math.MaxFloat64 is much larger than math.MaxInt64? A way, that triggers an overflow error?

If you are converting (not casting or typecasting) from a float64 to an int64, then you can compare the float value to the closest double precision integer values of math.MaxInt64 and math.MinInt64. These numbers cannot be exactly stored in a float64 value, however they will be stored as the next integer value above or below those:
float64(math.MaxInt64) // 9223372036854775808
float64(math.MinInt64) // -9223372036854775808
So as long as your float64 value is between these two values, you know it can be converted to an int64 without overflow.
Note however that if you are converting in the other direction, you cannot guarantee that a float64 can accurately represent all int64 values up to MaxInt64, since a double precision float point value only holds 52 bits of precision. In that case you must check that the int64 value is between -1<<53 and 1<<53.
In either case you can handle larger values with the math/big package.

Related

How go compare overflow int? [duplicate]

I've been reading this post on constants in Go, and I'm trying to understand how they are stored and used in memory. You can perform operations on very large constants in Go, and as long as the result fits in memory, you can coerce that result to a type. For example, this code prints 10, as you would expect:
const Huge = 1e1000
fmt.Println(Huge / 1e999)
How does this work under the hood? At some point, Go has to store 1e1000 and 1e999 in memory, in order to perform operations on them. So how are constants stored, and how does Go perform arithmetic on them?
Short summary (TL;DR) is at the end of the answer.
Untyped arbitrary-precision constants don't live at runtime, constants live only at compile time (during the compilation). That being said, Go does not have to represent constants with arbitrary precision at runtime, only when compiling your application.
Why? Because constants do not get compiled into the executable binaries. They don't have to be. Let's take your example:
const Huge = 1e1000
fmt.Println(Huge / 1e999)
There is a constant Huge in the source code (and will be in the package object), but it won't appear in your executable. Instead a function call to fmt.Println() will be recorded with a value passed to it, whose type will be float64. So in the executable only a float64 value being 10.0 will be recorded. There is no sign of any number being 1e1000 in the executable.
This float64 type is derived from the default type of the untyped constant Huge. 1e1000 is a floating-point literal. To verify it:
const Huge = 1e1000
x := Huge / 1e999
fmt.Printf("%T", x) // Prints float64
Back to the arbitrary precision:
Spec: Constants:
Numeric constants represent exact values of arbitrary precision and do not overflow.
So constants represent exact values of arbitrary precision. As we saw, there is no need to represent constants with arbitrary precision at runtime, but the compiler still has to do something at compile time. And it does!
Obviously "infinite" precision cannot be dealt with. But there is no need, as the source code itself is not "infinite" (size of the source is finite). Still, it's not practical to allow truly arbitrary precision. So the spec gives some freedom to compilers regarding to this:
Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
Represent integer constants with at least 256 bits.
Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.
Give an error if unable to represent an integer constant precisely.
Give an error if unable to represent a floating-point or complex constant due to overflow.
Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
These requirements apply both to literal constants and to the result of evaluating constant expressions.
However, also note that when all the above said, the standard package provides you the means to still represent and work with values (constants) with "arbitrary" precision, see package go/constant. You may look into its source to get an idea how it's implemented.
Implementation is in go/constant/value.go. Types representing such values:
// A Value represents the value of a Go constant.
type Value interface {
// Kind returns the value kind.
Kind() Kind
// String returns a short, human-readable form of the value.
// For numeric values, the result may be an approximation;
// for String values the result may be a shortened string.
// Use ExactString for a string representing a value exactly.
String() string
// ExactString returns an exact, printable form of the value.
ExactString() string
// Prevent external implementations.
implementsValue()
}
type (
unknownVal struct{}
boolVal bool
stringVal string
int64Val int64 // Int values representable as an int64
intVal struct{ val *big.Int } // Int values not representable as an int64
ratVal struct{ val *big.Rat } // Float values representable as a fraction
floatVal struct{ val *big.Float } // Float values not representable as a fraction
complexVal struct{ re, im Value }
)
As you can see, the math/big package is used to represent untyped arbitrary precision values. big.Int is for example (from math/big/int.go):
// An Int represents a signed multi-precision integer.
// The zero value for an Int represents the value 0.
type Int struct {
neg bool // sign
abs nat // absolute value of the integer
}
Where nat is (from math/big/nat.go):
// An unsigned integer x of the form
//
// x = x[n-1]*_B^(n-1) + x[n-2]*_B^(n-2) + ... + x[1]*_B + x[0]
//
// with 0 <= x[i] < _B and 0 <= i < n is stored in a slice of length n,
// with the digits x[i] as the slice elements.
//
// A number is normalized if the slice contains no leading 0 digits.
// During arithmetic operations, denormalized values may occur but are
// always normalized before returning the final result. The normalized
// representation of 0 is the empty or nil slice (length = 0).
//
type nat []Word
And finally Word is (from math/big/arith.go)
// A Word represents a single digit of a multi-precision unsigned integer.
type Word uintptr
Summary
At runtime: predefined types provide limited precision, but you can "mimic" arbitrary precision with certain packages, such as math/big and go/constant. At compile time: constants seemingly provide arbitrary precision, but in reality a compiler may not live up to this (doesn't have to); but still the spec provides minimal precision for constants that all compiler must support, e.g. integer constants must be represented with at least 256 bits which is 32 bytes (compared to int64 which is "only" 8 bytes).
When an executable binary is created, results of constant expressions (with arbitrary precision) have to be converted and represented with values of finite precision types – which may not be possible and thus may result in compile-time errors. Note that only results –not intermediate operands– have to be converted to finite precision, constant operations are carried out with arbitrary precision.
How this arbitrary or enhanced precision is implemented is not defined by the spec, math/big for example stores "digits" of the number in a slice (where digits is not a digit of the base 10 representation, but "digit" is an uintptr which is like base 4294967295 representation on 32-bit architectures, and even bigger on 64-bit architectures).
Go constants are not allocated to memory. They are used in context by the compiler. The blog post you refer to gives the example of Pi:
Pi = 3.14159265358979323846264338327950288419716939937510582097494459
If you assign Pi to a float32 it will lose precision to fit, but if you assign it to a float64, it will lose less precision, but the compiler will determine what type to use.

golang losing precision while converting float32 to float64?

In Golang, it seems that when a float64 var first convert to float32 then convert float64, it's value will change.
a := -8888.95
fmt.Println(a) // -8888.95
fmt.Println(float32(a)) // -8888.95
fmt.Println(float64(float32(a))) // -8888.9501953125
How can I make it unchanging
The way you have described the problem is perhaps misleading.
The precision is not lost "when converting float32 to float64"; rather, it is lost when converting from float64 to float32.
So how can you avoid losing precision when converting from float64 to float32? You can't. This task is impossible, and it's quite easy to see the reason why:
float64 has twice as many bits as float32
multiple different float64 values will map to the same float32 value due to the pigeonhole principle
the conversion is therefore not reversible.
package main
import (
"fmt"
)
func main() {
a := -8888.95
fmt.Printf("%.20f\n", a)
fmt.Printf("%.20f\n", float32(a))
fmt.Printf("%.20f\n", float64(float32(a)))
}
Adjusting your program to show a more precise output of each value, you'll see exactly where the precision is lost:
-8888.95000000000072759576
-8888.95019531250000000000
-8888.95019531250000000000
That is, after the float32 conversion (as is expected).
It's also worth noting that neither float64 nor float32 can represent your value -8888.95 exactly. If you convert this number to a fraction, you will get -177779/20. Notice the denominator, 20. The prime factorization of 20 is 2 * 2 * 5.
If you apply this process to a number and the prime factorization of the denominator contains any factors which are NOT 2, then you can rest assured that this number is definitely not representable exactly in binary floating point form. You may discover that the probability of any number passing this test is quite low.

When casting an int64 to uint64, is the sign retained?

I have an int64 variable which contains a negative number and I wish to subtract it from an uint64 variable which contains a positive number:
var endTime uint64
now := time.Now().Unix()
endTime = uint64(now)
var interval int64
interval = -3600
endTime = endTime + uint64(interval)
The above code appears to work but I wonder if I can rely on this. I am surprised, being new to Go, that after casting a negative number to uint64 that it remains negative -- I had planned to subtract the now positive value (after casting) to get what I wanted.
Converting a signed number to an unsigned number will not remain negative, it can't, as the valid range of unsigned types doesn't include negative numbers. If you print uint(interval), you will certainly see a positive number printed.
What you experience is deterministic and you can rely on it (but it doesn't mean you should). It is the result of Go (and most other programming languages) storing signed integer types using the 2's completement representation.
What this means is that in case of negative numbers, using n bit, the value -x (where x is positive) is stored as the binary representation of the positive value 2^n - x. This has the advantage that numbers can be added bitwise, and the result will be correct regardless of whether they are negative or positive.
So when you have a signed negative number, it is basically stored in memory like if you would subtract its absolute value from 0. Which means that if you convert a negative, signed value to unsigned, and you add that to an unsigned value, the result will be correct because overflow will happen, in a useful way.
Converting a value of type int64 to uint64 does not change the memory layout, only the type. So what 8 bytes the int64 had, the converted uint64 will have those same 8 bytes. And as mentioned above, the representation stored in those 8 bytes is the bit pattern identical to the bit pattern of value 0 - abs(x). So the result of the conversion will be a number that you would get if you would subtract abs(x) from 0, in the unsigned world. Yes, this won't be negative (as the type is unsigned), instead it will be a "big" number, counting down from the max value of the uint64 type. But if you add an y number bigger than abs(x) to this "big" number, overflow will happen, and the result will be like y - abs(x).
See this simple example demonstrating what's happening (try it on the Go Playground):
a := uint8(100)
b := int8(-10)
fmt.Println(uint8(b)) // Prints 226 which is: 0 - 10 = 256 - 10
a = a + uint8(b)
fmt.Println(a) // Prints 90 which is: 100 + 226 = 326 = 90
// after overflow: 326 - 256 = 90
As mentioned above, you should not rely on this, as this may cause confusion. If you intend to work with negative numbers, then use signed types.
And if you work with a code base that already uses uint64 values, then do a subtraction instead of addition, using uint64 values:
interval := uint64(3600)
endTime -= interval
Also note that if you have time.Time values, you should take advantage of its Time.Add() method:
func (t Time) Add(d Duration) Time
You may specify a time.Duration to add to the time, which may be negative if you want to go back in time, like this:
t := time.Now()
t = t.Add(-3600 * time.Second)
time.Duration is more expressive: we see the value we specified above uses seconds explicitly.

How does Go perform arithmetic on constants?

I've been reading this post on constants in Go, and I'm trying to understand how they are stored and used in memory. You can perform operations on very large constants in Go, and as long as the result fits in memory, you can coerce that result to a type. For example, this code prints 10, as you would expect:
const Huge = 1e1000
fmt.Println(Huge / 1e999)
How does this work under the hood? At some point, Go has to store 1e1000 and 1e999 in memory, in order to perform operations on them. So how are constants stored, and how does Go perform arithmetic on them?
Short summary (TL;DR) is at the end of the answer.
Untyped arbitrary-precision constants don't live at runtime, constants live only at compile time (during the compilation). That being said, Go does not have to represent constants with arbitrary precision at runtime, only when compiling your application.
Why? Because constants do not get compiled into the executable binaries. They don't have to be. Let's take your example:
const Huge = 1e1000
fmt.Println(Huge / 1e999)
There is a constant Huge in the source code (and will be in the package object), but it won't appear in your executable. Instead a function call to fmt.Println() will be recorded with a value passed to it, whose type will be float64. So in the executable only a float64 value being 10.0 will be recorded. There is no sign of any number being 1e1000 in the executable.
This float64 type is derived from the default type of the untyped constant Huge. 1e1000 is a floating-point literal. To verify it:
const Huge = 1e1000
x := Huge / 1e999
fmt.Printf("%T", x) // Prints float64
Back to the arbitrary precision:
Spec: Constants:
Numeric constants represent exact values of arbitrary precision and do not overflow.
So constants represent exact values of arbitrary precision. As we saw, there is no need to represent constants with arbitrary precision at runtime, but the compiler still has to do something at compile time. And it does!
Obviously "infinite" precision cannot be dealt with. But there is no need, as the source code itself is not "infinite" (size of the source is finite). Still, it's not practical to allow truly arbitrary precision. So the spec gives some freedom to compilers regarding to this:
Implementation restriction: Although numeric constants have arbitrary precision in the language, a compiler may implement them using an internal representation with limited precision. That said, every implementation must:
Represent integer constants with at least 256 bits.
Represent floating-point constants, including the parts of a complex constant, with a mantissa of at least 256 bits and a signed exponent of at least 32 bits.
Give an error if unable to represent an integer constant precisely.
Give an error if unable to represent a floating-point or complex constant due to overflow.
Round to the nearest representable constant if unable to represent a floating-point or complex constant due to limits on precision.
These requirements apply both to literal constants and to the result of evaluating constant expressions.
However, also note that when all the above said, the standard package provides you the means to still represent and work with values (constants) with "arbitrary" precision, see package go/constant. You may look into its source to get an idea how it's implemented.
Implementation is in go/constant/value.go. Types representing such values:
// A Value represents the value of a Go constant.
type Value interface {
// Kind returns the value kind.
Kind() Kind
// String returns a short, human-readable form of the value.
// For numeric values, the result may be an approximation;
// for String values the result may be a shortened string.
// Use ExactString for a string representing a value exactly.
String() string
// ExactString returns an exact, printable form of the value.
ExactString() string
// Prevent external implementations.
implementsValue()
}
type (
unknownVal struct{}
boolVal bool
stringVal string
int64Val int64 // Int values representable as an int64
intVal struct{ val *big.Int } // Int values not representable as an int64
ratVal struct{ val *big.Rat } // Float values representable as a fraction
floatVal struct{ val *big.Float } // Float values not representable as a fraction
complexVal struct{ re, im Value }
)
As you can see, the math/big package is used to represent untyped arbitrary precision values. big.Int is for example (from math/big/int.go):
// An Int represents a signed multi-precision integer.
// The zero value for an Int represents the value 0.
type Int struct {
neg bool // sign
abs nat // absolute value of the integer
}
Where nat is (from math/big/nat.go):
// An unsigned integer x of the form
//
// x = x[n-1]*_B^(n-1) + x[n-2]*_B^(n-2) + ... + x[1]*_B + x[0]
//
// with 0 <= x[i] < _B and 0 <= i < n is stored in a slice of length n,
// with the digits x[i] as the slice elements.
//
// A number is normalized if the slice contains no leading 0 digits.
// During arithmetic operations, denormalized values may occur but are
// always normalized before returning the final result. The normalized
// representation of 0 is the empty or nil slice (length = 0).
//
type nat []Word
And finally Word is (from math/big/arith.go)
// A Word represents a single digit of a multi-precision unsigned integer.
type Word uintptr
Summary
At runtime: predefined types provide limited precision, but you can "mimic" arbitrary precision with certain packages, such as math/big and go/constant. At compile time: constants seemingly provide arbitrary precision, but in reality a compiler may not live up to this (doesn't have to); but still the spec provides minimal precision for constants that all compiler must support, e.g. integer constants must be represented with at least 256 bits which is 32 bytes (compared to int64 which is "only" 8 bytes).
When an executable binary is created, results of constant expressions (with arbitrary precision) have to be converted and represented with values of finite precision types – which may not be possible and thus may result in compile-time errors. Note that only results –not intermediate operands– have to be converted to finite precision, constant operations are carried out with arbitrary precision.
How this arbitrary or enhanced precision is implemented is not defined by the spec, math/big for example stores "digits" of the number in a slice (where digits is not a digit of the base 10 representation, but "digit" is an uintptr which is like base 4294967295 representation on 32-bit architectures, and even bigger on 64-bit architectures).
Go constants are not allocated to memory. They are used in context by the compiler. The blog post you refer to gives the example of Pi:
Pi = 3.14159265358979323846264338327950288419716939937510582097494459
If you assign Pi to a float32 it will lose precision to fit, but if you assign it to a float64, it will lose less precision, but the compiler will determine what type to use.

On-purpose int overflow

I'm using the hash function murmur2 which returns me an uint64.
I want then to store it in PostgreSQL, which only support BIGINT (signed 64 bits).
As I'm not interested in the number itself, but just the binary value (as I use it as an id for detecting uniqueness (my set of values being of ~1000 values, a 64bit hash is enough for me) I would like to convert it into int64 by "just" changing the type.
How does one do that in a way that pleases the compiler?
You can simply use a type conversion:
i := uint64(0xffffffffffffffff)
i2 := int64(i)
fmt.Println(i, i2)
Output:
18446744073709551615 -1
Converting uint64 to int64 always succeeds: it doesn't change the memory representation just the type. What may confuse you is if you try to convert an untyped integer constant value to int64:
i3 := int64(0xffffffffffffffff) // Compile time error!
This is a compile time error as the constant value 0xffffffffffffffff (which is represented with arbitrary precision) does not fit into int64 because the max value that fits into int64 is 0x7fffffffffffffff:
constant 18446744073709551615 overflows int64

Resources