Byte ordering of floats - go

I'm working on Tormenta (https://github.com/jpincas/tormenta) which is backed by BadgerDB (https://github.com/dgraph-io/badger). BadgerDB stores keys (slices of bytes) in byte order. I am creating keys which contain floats which need to be stored in order so I can user Badger's key iteration properly. I don't have a solid CS background so I'm a little out of my depth.
I encode the floats like this: binary.Write(buf, binary.BigEndian, myFloat). This works fine for positive floats - the key order is what you'd expect, but the byte ordering breaks down for negative floats.
As an aside, ints present the same problem, but I was able to fix that relatively easily by flipping the sign bit on the int with b[0] ^= 1 << 7 (where b is the []byte holding the result of encoding the int), and then flipping back when retrieving the key.
Although b[0] ^= 1 << 7 DOES also flip the sign bit on floats and thus places all the negative floats before the positive ones, the negative ones are incorrectly (backwards) ordered. It is necessary to flip the sign bit and reverse the order of the negative floats.
A similar question was asked on StackOverflow here: Sorting floating-point values using their byte-representation, and the solution was agreed to be:
XOR all positive numbers with 0x8000... and negative numbers with 0xffff.... This should flip the sign bit on both (so negative numbers go first), and then reverse the ordering on negative numbers.
However, that's way above my bit-flipping-skills level, so I was hoping a Go bit-ninja could help me translate that into some Go code.

Using math.Float64bits()
You could use math.Float64bits() which returns an uint64 value having the same bytes / bits as the float64 value passed to it.
Once you have an uint64, performing bitwise operations on it is trivial:
f := 1.0 // Some float64 value
bits := math.Float64bits(f)
if f >= 0 {
bits ^= 0x8000000000000000
} else {
bits ^= 0xffffffffffffffff
}
Then serialize the bits value instead of the f float64 value, and you're done.
Let's see this in action. Let's create a wrapper type holding a float64 number and its bytes:
type num struct {
f float64
data [8]byte
}
Let's create a slice of these nums:
nums := []*num{
{f: 1.0},
{f: 2.0},
{f: 0.0},
{f: -1.0},
{f: -2.0},
{f: math.Pi},
}
Serializing them:
for _, n := range nums {
bits := math.Float64bits(n.f)
if n.f >= 0 {
bits ^= 0x8000000000000000
} else {
bits ^= 0xffffffffffffffff
}
if err := binary.Write(bytes.NewBuffer(n.data[:0]), binary.BigEndian, bits); err != nil {
panic(err)
}
}
This is how we can sort them byte-wise:
sort.Slice(nums, func(i int, j int) bool {
ni, nj := nums[i], nums[j]
for k := range ni.data {
if bi, bj := ni.data[k], nj.data[k]; bi < bj {
return true // We're certain it's less
} else if bi > bj {
return false // We're certain it's not less
} // We have to check the next byte
}
return false // If we got this far, they are equal (=> not less)
})
And now let's see the order after byte-wise sorting:
fmt.Println("Final order byte-wise:")
for _, n := range nums {
fmt.Printf("% .7f %3v\n", n.f, n.data)
}
Output will be (try it on the Go Playground):
Final order byte-wise:
-2.0000000 [ 63 255 255 255 255 255 255 255]
-1.0000000 [ 64 15 255 255 255 255 255 255]
0.0000000 [128 0 0 0 0 0 0 0]
1.0000000 [191 240 0 0 0 0 0 0]
2.0000000 [192 0 0 0 0 0 0 0]
3.1415927 [192 9 33 251 84 68 45 24]
Without math.Float64bits()
Another option would be to first serialize the float64 value, and then perform XOR operations on the bytes.
If the number is positive (or zero), XOR the first byte with 0x80, and the rest with 0x00, which is basically do nothing with them.
If the number is negative, XOR all the bytes with 0xff, which is basically a bitwise negation.
In action: the only part that is different is the serialization and XOR operations:
for _, n := range nums {
if err := binary.Write(bytes.NewBuffer(n.data[:0]), binary.BigEndian, n.f); err != nil {
panic(err)
}
if n.f >= 0 {
n.data[0] ^= 0x80
} else {
for i, b := range n.data {
n.data[i] = ^b
}
}
}
The rest is the same. The output will also be the same. Try this one on the Go Playground.

Related

Splitting big.Int by digit

I'm trying to split a big.Int into a number of int64s such that each is a portion of the larger number, with a standard offset of 18 digits. For example, given the following input value of 1234512351234088800000999, I would expect the following output: [351234088800000999, 1234512]. For negative numbers, I would expect all of the parts to be negative (i.e. -1234512351234088800000999 produces [-351234088800000999, -1234512]).
I already know I can do this to get the result I want:
func Split(input *big.Int) []int64 {
const width = 18
asStr := in.Coefficient().Text(10)
strLen := len(asStr)
offset := 0
if in.IsNegative() {
offset = 1
}
length := int(math.Ceil(float64(strLen-offset) / width))
ints := make([]int64, length)
for i := 1; i <= length; i++ {
start := strLen - (i * width)
end := start + width
if start < 0 || (start == 1 && asStr[0] == '-') {
start = 0
}
ints[i-1], _ = strconv.ParseInt(asStr[start:end], 10, 64)
if offset == 1 && ints[i-1] > 0 {
ints[i-1] = 0 - ints[i-1]
}
}
return ints
}
However, I don't like the idea of using string-parsing nor do I like the use of strconv. Is there a way I can do this utilizing the big.Int directly?
You can use the DivMod function to do what you need here, with some special care to handle negative numbers:
var offset = big.NewInt(1e18)
func Split(input *big.Int) []int64 {
rest := new(big.Int)
rest.Abs(input)
var ints []int64
r := new(big.Int)
for {
rest.DivMod(rest, offset, r)
ints = append(ints, r.Int64() * int64(input.Sign()))
if rest.BitLen() == 0 {
break
}
}
return ints
}
Multiplying each output by input.Sign() ensures that each output will be negative if the input is negative. The sum of the output values multiplied by 1e18 times their position in the output should equal the input.

How to write LEB128 in Go

How do you write an integer to LEB128 format in Go? I'm trying to encode an int32 to a Minecraft VarInt, so far I've tried importing the example on the wiki to Go. I get the wrong results when testing though, wiki says -1 should equal [255 255 255 255 15], but I get [255 255 255 255 255] instead. What I'm I doing wrong here?
func WriteVarInt2(v int32) []byte{
var out []byte
c := 0
for{
currentByte := byte(v & 0b01111111)
v >>= 7
if v != 0 {
currentByte |= 0b10000000
}
out = append(out, currentByte)
c++
if c >= 5 || v == 0{
return out
}
}
}
The problem is with the shifting operation.
>> is arithmetic shift right, >>> is logical shift right. The difference is that >> brings in the sign bit (on the left), while >>> brings in zeros (whatever the sign bit was).
The algorithm of LEB128's Varint uses logical shift, and Go's >> is arithmetic shift.
There is no distinct logical shift in Go, but if you treat the number as unsigned, you'll get exactly that:
func WriteVarInt2(v_ int32) []byte {
v := uint32(v_)
// rest of your function unchanged
// ...
}
Testing it:
fmt.Println(WriteVarInt2(-1))
Output is as expected (try it on the Go Playground):
[255 255 255 255 15]

Convert string of numbers into 'binary representation'

Im recently made the "Winning Lottery Ticket" coding challange on hackerrank.
https://www.hackerrank.com/challenges/winning-lottery-ticket/
The idea is to count the combinations of two lines which contain all numbers from 0-9, in the example below its 5 combinations in total.
56789
129300455
5559948277
012334556
123456879
The idea is to change the the representation of something quicker for checking if all numbers are contained.
Example representation:
1278 --> 01100001100
Example with using the first two lines from above:
56789129300455 --> 1111111111
When checking if a number is contained with the concatenation of 2 lines I can abort directly if I encounter a zero because thats not gonna be a pair with all 0-9.
This logic works, but it fails when having a huge amount of lines to compare.
// Go code
func winningLotteryTicket(tickets []string) int {
counter := 0
for i := 0; i < len(tickets); i++ {
for j := i + 1; j < len(tickets); j++ {
if err := bitMask(fmt.Sprintf("%v%v", tickets[i], tickets[j])); err == nil {
counter++
}
}
}
return counter
}
func bitMask(s string) error {
for i := 0; i <= 9; i++ {
if !strings.Contains(s, strconv.Itoa(i)) {
return errors.New("No Pair")
}
}
return nil
}
Not sure if this representation is called a bitMaks, if not please correct me and I will adjust this post.
From my point of view there is no way the improove performance on the concatenation of the strings because I will have to check each combination.
For checking if a number is contained within the string at the function "bitMask" im not sure.
Do you have an idea how this could perform better ?
Bit masks are integers, not strings of ones and zeros. It's called a bitmask because we're not interested in the numerical value of these integers but only in the bit pattern. We can use bitwise operations on integers and those are really fast because they are implemented in hardware, directly in the CPU.
Here is a function that turns a string into an actual bitmask, with each one-bit signaling that a particular digit is present in the string:
func mask(s string) uint16 {
// We need ten bits, one for each possible decimal digit in s, so int16 and
// uint16 are the smallest possible integer types that fit. For bitmasks it
// is typical to select an unsigned type because the sign bit doesn't have
// any meaning. As I said earlier, mask's numerical value is irrelevant.
var mask uint16
for _, c := range s {
switch c {
case '0':
mask |= 0b0000000001
case '1':
mask |= 0b0000000010
case '2':
mask |= 0b0000000100
case '3':
mask |= 0b0000001000
case '4':
mask |= 0b0000010000
case '5':
mask |= 0b0000100000
case '6':
mask |= 0b0001000000
case '7':
mask |= 0b0010000000
case '8':
mask |= 0b0100000000
case '9':
mask |= 0b1000000000
}
}
return mask
}
This is rather verbose, but it should be pretty obvious what happens.
Note that the binary number literals can be replaced with bit shifts:
0b0000000001 is the same as 1<<0 (1 shifted zero times to the left)
0b0000000010 is the same as 1<<1 (1 shifted one time to the left
0b0000000100 is the same as 1<<2 (1 shifted two times to the left), and so on
Using this, and taking advantage of the fact that the bytes '0' through '9' are themselves just integers (48 through 57 in decimal, given by their place in the ASCII table, we can shorten this function like so:
func mask(s string) uint16 {
var mask uint16
for _, c := range s {
if '0' <= c && c <= '9' {
mask |= 1 << (c - '0')
}
}
return mask
}
To check two lines, then, all we have to do is OR the masks for the lines and compare to 0b1111111111 (i.e. check if all ten bits are set):
package main
import "fmt"
func main() {
a := "56789"
b := "129300455"
mA := mask(a)
mB := mask(b)
fmt.Printf("mask(%11q) = %010b\n", a, mA)
fmt.Printf("mask(%11q) = %010b\n", b, mB)
fmt.Printf("combined = %010b\n", mA|mB)
fmt.Printf("all digits present: %v\n", mA|mB == 0b1111111111)
}
func mask(s string) uint16 {
var mask uint16
for _, c := range s {
if '0' <= c && c <= '9' {
mask |= 1 << (c - '0')
}
}
return mask
}
mask( "56789") = 1111100000
mask("129300455") = 1000111111
combined = 1111111111
all digits present: true
Try it on the playground: https://play.golang.org/p/mr1KqnC9phB

Sign of integer and binary AND check

If I'm right in C++ the LSB is the last bit and determines the sign of the integer. So for example in case of a 8 bit 1111 1111 will be -127 and 1111 1110 will be 127. Please correct me if I'm wrong, but it's not related.
I would check fir the sign of the integer in Go, so I wrote the following:
func signCheck(n int8) {
var sign int8 = 1<<7
if n&sign == 1 {
fmt.Printf("%d is odd - %b\n", n, n)
} else {
fmt.Printf("%d is even - %b\n", n, n)
}
}
This will print out "constant 128 overflows int8", but that make sense because because there is only 7 bit used for determine a number. So I modified as follow:
func signCheck(n int8) {
if n&1<<7 == 1 {
fmt.Printf("%d is odd - %b\n", n, n)
} else {
fmt.Printf("%d is even - %b\n", n, n)
}
}
In this case I don't have to say it's an int8 but I tested it with -127 and 127 and got the following prints:
-127 is even - -1111111
127 is even - 1111111
So in that case how should I check for the sign?
go version go1.13.1 linux/amd64
To represent negative integers, Go (and I believe C++ too like most languages) uses the 2's complement format and standard.
In 2's complement the highest bit (MSB) will be 1 if the number is negative, and 0 otherwise.
In Go you can't use a 0x80 typed constant having int8 as its type, because that's outside of its valid range.
You may however convert int8 to uint8 "losslessly", and then you may do so.
Also, when you're masking the 0x80 bit, you have to compare the result to 0x80, because x & 0x80 can never result in 1, only in 0 or 0x80.
func signCheck(n int8) {
if uint8(n)&0x80 == 0x80 {
fmt.Printf("%d is negative - %b\n", n, n)
} else {
fmt.Printf("%d is non-negative - %b\n", n, n)
}
}
Another option is to compare the masking result to 0:
if uint8(n)&0x80 != 0 { }
Both solutions give the output (try them on the Go Playground):
-127 is negative - -1111111
127 is non-negative - 1111111

Clearing the most significant bit

I have a file containing two bytes, in Big Endian order, hexdump gives me:
81 50
which is 1000 0001 0101 0000 in binary. However, I want the most significant bit to be a flag, so in golang I have to load the file content, clear the most significant bit, and only then read the value.
So:
valueBuf := make([]byte, 2)
_, err := f.Read(valueBuf) // now printing valueBuf gives me [129 80] in decimal
value := int16(binary.BigEndian.Uint16(valueBuf[0:2])) // now value is -32432
Ok, I have tried to use something like:
func clearBit(n int16, pos uint) int16 {
mask := ^(1 << pos)
n &= mask
return n
}
But it apparently doesn't work as expected. The output value should be 336 in decimal, as normal int, and I cannot get it. How should I do this?
for n &= mask to work, n and mask have to be matching types. So you should write
mask := int16(^(1 << pos))
then, value = clearBit(value, 15) works fine.
Or, since constants are untyped, you can eliminate mask, and also eliminate the assignment to n since it's just returned on the following line, and shorten clearBit to
func clearBit(n int16, pos uint) int16 {
return n & ^(1 << pos)
}

Resources