int64(math.Pow(2, 63) - 1) results in -9223372036854775808 rather than 9223372036854775807 - go

I am trying to store max and min signed ints of different bits. The code works just fine for ints other than int64
package main
import (
"fmt"
"math"
)
func main() {
var minInt8 int8 = -128
var maxInt8 int8 = 127
fmt.Println("int8\t->", minInt8, "to", maxInt8)
fmt.Println("int8\t->", math.MinInt8, "to", math.MaxInt8)
var minInt16 int16 = int16(math.Pow(-2, 15))
var maxInt16 int16 = int16(math.Pow(2, 15) - 1)
fmt.Println("int16\t->", minInt16, "to", maxInt16)
fmt.Println("int16\t->", math.MinInt16, "to", math.MaxInt16)
var minInt32 int32 = int32(math.Pow(-2, 31))
var maxInt32 int32 = int32(math.Pow(2, 31) - 1)
fmt.Println("int32\t->", minInt32, "to", maxInt32)
fmt.Println("int32\t->", math.MinInt32, "to", math.MaxInt32)
var minInt64 int64 = int64(math.Pow(-2, 63))
var maxInt64 int64 = int64(math.Pow(2, 63) - 1) // gives me the wrong output
fmt.Println("int64\t->", minInt64, "to", maxInt64)
fmt.Println("int64\t->", math.MinInt64, "to", math.MaxInt64)
}
Output:
int8 -> -128 to 127
int8 -> -128 to 127
int16 -> -32768 to 32767
int16 -> -32768 to 32767
int32 -> -2147483648 to 2147483647
int32 -> -2147483648 to 2147483647
int64 -> -9223372036854775808 to -9223372036854775808
int64 -> -9223372036854775808 to 9223372036854775807
I have no idea about the cause of this behavior, any help would be appreciated.

There are multiple problems here:
math.Pow returns a float64. This type cannot be used to represent a 64 bit signed integer with full precision as required for the attempted computation here. To cite from Double-precision floating-point format
Precision limitations on integer values
Integers from −2^53 to 2^53 (−9,007,199,254,740,992 to 9,007,199,254,740,992) can be exactly
represented
Integers between 2^53 and 2^54 = 18,014,398,509,481,984
round to a multiple of 2 (even number)
Integers between 2^54 and 2^55 =
36,028,797,018,963,968 round to a multiple of 4
Even if the precision would be sufficient (which is true in the special case of 2^63) then the precision of float64 is not sufficient to substract 1 from 2^63. Just try the following (uint64 is used here since signed int64 is not sufficient):
uint64(math.Pow(2, 63)) // -> 9223372036854775808
uint64(math.Pow(2, 63)-1) // -> 9223372036854775808
Converting the value first to uint64 and then subtracting works instead, but only because 2^63 can be represented with full prevision in float64 even though other values with this size can not:
uint64(math.Pow(2, 63))-1 // -> 9223372036854775807

Related

How to coerce math.Inf to an integer?

I've got some code I'm using to do comparisons, and I want to start with infinite values. Here's a snippet of my code.
import (
"fmt"
"math"
)
func snippet(arr []int) {
least := int(math.Inf(1))
greatest := int(math.Inf(-1))
fmt.Println("least", math.Inf(1), least)
fmt.Println("greatest", math.Inf(-1), greatest)
}
and here's the output I get from the console
least +Inf -9223372036854775808
greatest -Inf -9223372036854775808
why is +Inf coerced into a negative int ?
Infinity is not representable by int.
According to the go spec,
In all non-constant conversions involving floating-point or complex values, if the result type cannot represent the value the conversion succeeds but the result value is implementation-dependent.
Maybe you are looking for the largest representable int? How to get it is explained here.
math.Inf() returns an IEEE double-precision float representing positive infinity if the sign of the argument is >= 0, and negative infinity if the sign is < 0, so your code is incorrect.
But, the Go language specifiction (always good to read the specifications) says this:
Conversions between numeric types
.
.
.
In all non-constant conversions involving floating-point or complex values,
if the result type cannot represent the value the conversion succeeds but
the result value is implementation-dependent.
Two's complement integer values don't have the concept of infinity, so the result is implementation dependent.
Myself, I'd have expected to get the largest or smallest integer value for the integer type the cast is targeting, but apparently that's not the case.
This looks to the runtime source file responsible for the conversion, https://go.dev/src/runtime/softfloat64.go
And this is the actual source code.
Note that an IEEE-754 double-precision float is a 64-bit double word, consisting of
a sign bit, the high-order (most significant/leftmost bit), 0 indicating positive, 1 indicating negative.
an exponent (biased), consisting of the next 11 bits, and
a mantissa, consisting of the remaining 52 bits, which can be denormalized.
Positive Infinity is a special value with a sign bit of 0, a exponent of all 1 bits, and a mantissa of all 0 bits:
0 11111111111 0000000000000000000000000000000000000000000000000000
or 0x7FF0000000000000.
Negative infinity is the same, with the exception that the sign bit is 1:
1 11111111111 0000000000000000000000000000000000000000000000000000
or 0xFFF0000000000000.
Looks like `funpack64() returns 5 values:
a uint64 representing the sign (0 or the very large non-zero value 0x8000000000000000),
a uint64 representing the normalized mantissa,
an int representing the exponent,
a bool indicating whether or not this is +/- infinity, and
a bool indicating whether or not this is NaN.
From that, you should be able to figure out why it returns the value it does.
[Frankly, I'm surprised that f64toint() doesn't short-circuit when funpack64() returns fi = true.]
const mantbits64 uint = 52
const expbits64 uint = 11
const bias64 = -1<<(expbits64-1) + 1
func f64toint(f uint64) (val int64, ok bool) {
fs, fm, fe, fi, fn := funpack64(f)
switch {
case fi, fn: // NaN
return 0, false
case fe < -1: // f < 0.5
return 0, false
case fe > 63: // f >= 2^63
if fs != 0 && fm == 0 { // f == -2^63
return -1 << 63, true
}
if fs != 0 {
return 0, false
}
return 0, false
}
for fe > int(mantbits64) {
fe--
fm <<= 1
}
for fe < int(mantbits64) {
fe++
fm >>= 1
}
val = int64(fm)
if fs != 0 {
val = -val
}
return val, true
}
func funpack64(f uint64) (sign, mant uint64, exp int, inf, nan bool) {
sign = f & (1 << (mantbits64 + expbits64))
mant = f & (1<<mantbits64 - 1)
exp = int(f>>mantbits64) & (1<<expbits64 - 1)
switch exp {
case 1<<expbits64 - 1:
if mant != 0 {
nan = true
return
}
inf = true
return
case 0:
// denormalized
if mant != 0 {
exp += bias64 + 1
for mant < 1<<mantbits64 {
mant <<= 1
exp--
}
}
default:
// add implicit top bit
mant |= 1 << mantbits64
exp += bias64
}
return
}

How to encode struct into binary with bit packing in Golang

I am trying to encode large data structs into binary. I have specified number of bits for each struct element. So I need to encode struct into binary according to bit length. Standard Golang library Encoding/binary packs each item minimum as one byte. Therefore I need another solution. How can I encode struct elements as specified bit number in Go?
For example; Item1 = 00001101 Item2 = 00000110 Result will be as 01101110
type Elements struct{
Item1 uint8 // number of bits = 5
Item2 uint8 // number of bits = 3
Item3 uint8 // number of bits = 2
Item4 uint64 // number of bits = 60
Item5 uint16 // number of bits = 11
Item6 []byte // bit length = 8
Item7 Others
}
type Others struct{
Other1 uint8 // number of bits = 4
Other2 uint32 // number of bits = 21
Other3 uint16 // number of bits = 9
}

IEEE 754 binary floating-point numbers imprecise for money

I have a problem when I use math.Floor with a floating-point variable (round down/truncate the precision part). How can I do it correctly?
package main
import (
"fmt"
"math"
)
func main() {
var st float64 = 1980
var salePrice1 = st * 0.1 / 1.1
fmt.Printf("%T:%v\n", salePrice1, salePrice1) // 179.9999
var salePrice2 = math.Floor(st * 0.1 / 1.1)
fmt.Printf("%T:%v\n", salePrice2, salePrice2) // 179
}
Playground: https://play.golang.org/p/49TjJwwEdEJ
Output:
float64:179.99999999999997
float64:179
I expect the output of 1980 * 0.1 / 1.1 to be 180, but the actual output is 179.
The original question:
Incorrect floor number in golang
I have problem when use Math.Floor with float variable (round
down/truncate the precision part). How can i do it correctly?
package main
import (
"fmt"
"math"
)
func main() {
var st float64 = 1980
var salePrice1 = st * 0.1 / 1.1
fmt.Printf("%T:%v\n", salePrice1, salePrice1)
var salePrice2 = math.Floor(st * 0.1 / 1.1)
fmt.Printf("%T:%v\n", salePrice2, salePrice2)
}
I expect the output of 1980 * 0.1 / 1.1 to be 180, but the actual
output is 179.”
Playground:
Output:
float64:179.99999999999997
float64:179
The XY problem is asking about your attempted solution rather than your actual problem: The XY Problem.
Clearly, this is a money calculation for salePrice1. Money calculations use precise decimal calculations, not imprecise binary floating-point calculations.
For money calculations use integers. For example,
package main
import "fmt"
func main() {
var st int64 = 198000 // $1980.00 as cents
fmt.Printf("%[1]T:%[1]v\n", st)
fmt.Printf("$%d.%02d\n", st/100, st%100)
var n, d int64 = 1, 11
fmt.Printf("%d, %d\n", n, d)
var salePrice1 int64 = (st * n) / d // round down
fmt.Printf("%[1]T:%[1]v\n", salePrice1)
fmt.Printf("$%d.%02d\n", salePrice1/100, salePrice1%100)
var salePrice2 int64 = ((st*n)*10/d + 5) / 10 // round half up
fmt.Printf("%[1]T:%[1]v\n", salePrice2)
fmt.Printf("$%d.%02d\n", salePrice2/100, salePrice2%100)
var salePrice3 int64 = (st*n + (d - 1)) / d // round up
fmt.Printf("%[1]T:%[1]v\n", salePrice1)
fmt.Printf("$%d.%02d\n", salePrice3/100, salePrice3%100)
}
Playground: https://play.golang.org/p/HbqVJUXXR-N
Output:
int64:198000
$1980.00
1, 11
int64:18000
$180.00
int64:18000
$180.00
int64:18000
$180.00
References:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
How should we calc money (decimal, big.Float)
General Decimal Arithmetic
Try this:
st := 1980.0
f := 0.1 / 1.1
salePrice1 := st * f
salePrice2 := math.Floor(salePrice1)
fmt.Println(salePrice2) // 180
It is a big topic:
For accounting systems: the answer is Floating point error mitigation.
(Note: one mitigation technique is to use int64, uint64, or big.Int)
And see:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
https://en.wikipedia.org/wiki/IEEE_floating_point
Let's start with:
fmt.Println(1.0 / 3.0) // 0.3333333333333333
IEEE 754 binary representation:
fmt.Printf("%#X\n", math.Float64bits(1.0/3.0)) // 0X3FD5555555555555
IEEE 754 binary representation of 1.1:
fmt.Printf("%#X\n", math.Float64bits(1.1)) // 0X3FF199999999999A
fmt.Printf("%#X\n", math.Float64bits(st*0.1/1.1)) // 0X40667FFFFFFFFFFF
Now, let:
st := 1980.0
f := 0.1 / 1.1
IEEE 754 binary representation of f is:
fmt.Printf("%#X\n", math.Float64bits(f)) // 0X3FB745D1745D1746
And:
salePrice1 := st * f
fmt.Println(salePrice1) // 180
fmt.Printf("%#X\n", math.Float64bits(salePrice1)) // 0X4066800000000000
salePrice2 := math.Floor(salePrice1)
fmt.Printf("%#X\n", math.Float64bits(salePrice2)) // 0X4066800000000000
Working with the floating-point numbers on the computer is not same as with pen and paper (Floating-point calculation errors):
var st float64 = 1980
var salePrice1 = st * 0.1 / 1.1
fmt.Println(salePrice1) // 179.99999999999997
salePrice1 is 179.99999999999997 not 180.0 so integer value less than or equal to 179.99999999999997 is 179:
See documents for func Floor(x float64) float64:
Floor returns the greatest integer value less than or equal to x.
See:
fmt.Println(math.Floor(179.999)) // 179
fmt.Println(math.Floor(179.5 + 0.5)) // 180
fmt.Println(math.Floor(179.999 + 0.5)) // 180
fmt.Println(math.Floor(180.0)) // 180
Some relevant QAs:
Golang floating point precision float32 vs float64
How to change a float64 number to uint64 in a right way?
Golang converting float64 to int error
Is floating point math broken?
Go float comparison
What does "%b" do in fmt.Printf for float64 and what is Min subnormal positive double in float64 in binary format?
Is there any standard library to convert float64 to string with fix width with maximum number of significant digits?
fmt.Printf with width and precision fields in %g behaves unexpectedly
Why is there a difference between floating-point multiplication with literals vs. variables in Go?
Golang Round to Nearest 0.05

Golang shift operator conversion

I can not understand in golang how 1<<s return 0 if var s uint = 33.
But 1<<33 return 8589934592.
How a shift operator conversion end up with a value of 0.
I'm reading the language specification and stuck in this section:
https://golang.org/ref/spec#Operators
Specifically this paragraph from docs:
"The right operand in a shift expression must have unsigned integer
type or be an untyped constant representable by a value of type uint.
If the left operand of a non-constant shift expression is an untyped
constant, it is first implicitly converted to the type it would assume
if the shift expression were replaced by its left operand alone."
Some example from official Golang docs:
var s uint = 33
var i = 1<<s // 1 has type int
var j int32 = 1<<s // 1 has type int32; j == 0
var k = uint64(1<<s) // 1 has type uint64; k == 1<<33
Update:
Another very related question, with an example:
package main
import (
"fmt"
)
func main() {
v := int16(4336)
fmt.Println(int8(v))
}
This program return -16
How does the number 4336 become -16 in converting int16 to int8
If you have this:
var s uint = 33
fmt.Println(1 << s)
Then the quoted part applies:
If the left operand of a non-constant shift expression is an untyped constant, it is first implicitly converted to the type it would assume if the shift expression were replaced by its left operand alone.
Because s is not a constant (it's a variable), therefore 1 >> s is a non-constant shift expression. And the left operand is 1 which is an untyped constant (e.g. int(1) would be a typed constant), so it is converted to a type that it would get if the expression would be simply 1 instead of 1 << s:
fmt.Println(1)
In the above, the untyped constant 1 would be converted to int, because that is its default type. Default type of constants is in Spec: Constants:
An untyped constant has a default type which is the type to which the constant is implicitly converted in contexts where a typed value is required, for instance, in a short variable declaration such as i := 0 where there is no explicit type. The default type of an untyped constant is bool, rune, int, float64, complex128 or string respectively, depending on whether it is a boolean, rune, integer, floating-point, complex, or string constant.
And the result of the above is architecture dependent. If int is 32 bits, it will be 0. If int is 64 bits, it will be 8589934592 (because shifting a 1 bit 33 times will shift it out of a 32-bit int number).
On the Go playground, size of int is 32 bits (4 bytes). See this example:
fmt.Println("int size:", unsafe.Sizeof(int(0)))
var s uint = 33
fmt.Println(1 << s)
fmt.Println(int32(1) << s)
fmt.Println(int64(1) << s)
The above outputs (try it on the Go Playground):
int size: 4
0
0
8589934592
If I run the above app on my 64-bit computer, the output is:
int size: 8
8589934592
0
8589934592
Also see The Go Blog: Constants for how constants work in Go.
Note that if you write 1 << 33, that is not the same, that is not a non-constant shift expression, which your quote applies to: "the left operand of a non-constant shift expression". 1<<33 is a constant shift expression, which is evaluated at "constant space", and the result would be converted to int which does not fit into a 32-bit int, hence the compile-time error. It works with variables, because variables can overflow. Constants do not overflow:
Numeric constants represent exact values of arbitrary precision and do not overflow.
See How does Go perform arithmetic on constants?
Update:
Answering your addition: converting from int16 to int8 simply keeps the lowest 8 bits. And integers are represented using the 2's complement format, where the highest bit is 1 if the number is negative.
This is detailed in Spec: Conversions:
When converting between integer types, if the value is a signed integer, it is sign extended to implicit infinite precision; otherwise it is zero extended. It is then truncated to fit in the result type's size. For example, if v := uint16(0x10F0), then uint32(int8(v)) == 0xFFFFFFF0. The conversion always yields a valid value; there is no indication of overflow.
So when you convert a int16 value to int8, if source number has a 1 in bit position 7 (8th bit), the result will be negative, even if the source wasn't negative. Similarly, if the source has 0 at bit position 7, the result will be positive, even if the source is negative.
See this example:
for _, v := range []int16{4336, -129, 8079} {
fmt.Printf("Source : %v\n", v)
fmt.Printf("Source hex: %4x\n", uint16(v))
fmt.Printf("Result hex: %4x\n", uint8(int8(v)))
fmt.Printf("Result : %4v\n", uint8(int8(v)))
fmt.Println()
}
Output (try it on the Go Playground):
Source : 4336
Source hex: 10f0
Result hex: f0
Result : -16
Source : -129
Source hex: ff7f
Result hex: 7f
Result : 127
Source : 8079
Source hex: 1f8f
Result hex: 8f
Result : -113
See related questions:
When casting an int64 to uint64, is the sign retained?
Format printing the 64bit integer -1 as hexadecimal deviates between golang and C
You're building and running the program in 32bit mode (go playground?). In it, int is 32-bit wide and behaves the same as int32.

Convert uint64 to int64 without loss of information

The problem with the following code:
var x uint64 = 18446744073709551615
var y int64 = int64(x)
is that y is -1. Without loss of information, is the only way to convert between these two number types to use an encoder and decoder?
buff bytes.Buffer
Encoder(buff).encode(x)
Decoder(buff).decode(y)
Note, I am not attempting a straight numeric conversion in your typical case. I am more concerned with maintaining the statistical properties of a random number generator.
Your conversion does not lose any information in the conversion. All the bits will be untouched. It is just that:
uint64(18446744073709551615) = 0xFFFFFFFFFFFFFFFF
int64(-1) = 0xFFFFFFFFFFFFFFFF
Try:
var x uint64 = 18446744073709551615 - 3
and you will have y = -4.
For instance: playground
var x uint64 = 18446744073709551615 - 3
var y int64 = int64(x)
fmt.Printf("%b\n", x)
fmt.Printf("%b or %d\n", y, y)
Output:
1111111111111111111111111111111111111111111111111111111111111100
-100 or -4
Seeing -1 would be consistent with a process running as 32bits.
See for instance the Go1.1 release notes (which introduced uint64)
x := ^uint32(0) // x is 0xffffffff
i := int(x) // i is -1 on 32-bit systems, 0xffffffff on 64-bit
fmt.Println(i)
Using fmt.Printf("%b\n", y) can help to see what is going on (see ANisus' answer)
As it turned out, the OP wheaties confirms (in the comments) it was run initially in 32 bits (hence this answer), but then realize 18446744073709551615 is 0xffffffffffffffff (-1) anyway: see ANisusanswer;
The types uint64 and int64 can both represent 2^64 discrete integer values.
The difference between the two is that uint64 holds only positive integers (0 thru 2^64-1), where as int64 holds both negative and positive integers using 1 bit to hold the sign (-2^63 thru 2^63-1).
As others have said, if your generator is producing 0xffffffffffffffff, uint64 will represent this as the raw integer (18,446,744,073,709,551,615) whereas int64 will interpret the two's complement value and return -1.

Resources