With Go, how to append unknown number of byte into a vector and get a slice of bytes? - go

I'm trying to encode a large number to a list of bytes(uint8 in Go).
The number of bytes is unknown, so I'd like to use vector.
But Go doesn't provide vector of byte, what can I do?
And is it possible to get a slice of such a byte vector?
I intends to implement data compression.
Instead of store small and large number with the same number of bytes,
I'm implements a variable bytes that uses less bytes with small number
and more bytes with large number.
My code can not compile, invalid type assertion:
1 package main
2
3 import (
4 //"fmt"
5 "container/vector"
6 )
7
8 func vbEncodeNumber(n uint) []byte{
9 bytes := new(vector.Vector)
10 for {
11 bytes.Push(n % 128)
12 if n < 128 {
13 break
14 }
15 n /= 128
16 }
17 bytes.Set(bytes.Len()-1, bytes.Last().(byte)+byte(128))
18 return bytes.Data().([]byte) // <-
19 }
20
21 func main() { vbEncodeNumber(10000) }
I wish to writes a lot of such code into binary file,
so I wish the func can return byte array.
I haven't find a code example on vector.

Since you're trying to represent large numbers, you might see if the big package serves your purposes.
The general Vector struct can be used to store bytes. It accepts an empty interface as its type, and any other type satisfies that interface. You can retrieve a slice of interfaces through the Data method, but there's no way to convert that to a slice of bytes without copying it. You can't use type assertion to turn a slice of interface{} into a slice of something else. You'd have to do something like the following at the end of your function: (I haven't tried compiling this code because I can't right now)
byteSlice = make([]byte, bytes.Len())
for i, _ := range byteSlice {
byteSlice[i] = bytes.At(i).(byte)
}
return byteSlice

Take a look at the bytes package and the Buffer type there. You can write your ints as bytes into the buffer and then you can use the Bytes() method to access byte slices of the buffer.

I've found the vectors to be a lot less useful since the generic append and copy were added to the language. Here's how I'd do it in one shot with less copying:
package main
import "fmt"
func vbEncodeNumber(n uint) []byte {
bytes := make([]byte, 0, 4)
for n > 0 {
bytes = append(bytes, byte(n%256))
n >>= 8
}
return bytes
}
func main() {
bytes := vbEncodeNumber(10000)
for i := len(bytes)-1; i >= 0 ; i-- {
fmt.Printf("%02x ", bytes[i])
}
fmt.Println("")
}

Related

How to generate a Youtube ID in Go?

I'm assuming all I need to do is encode 2^64 as base64 to get a 11 character Youtube identifier. I created a Go program https://play.golang.org/p/2nuA3JxVMd0
package main
import (
"crypto/rand"
"encoding/base64"
"encoding/binary"
"fmt"
"math"
"math/big"
"strings"
)
func main() {
// For example Youtube uses 11 characters of base64.
// How many base64 characters would it require to express a 2^64 number? 2^6^x = 2^64 .. so x = 64/6 = 10.666666666 … i.e. eleven rounded up.
// Generate a 64 bit number
val, _ := randint64()
fmt.Println(val)
// Encode the 64 bit number
b := make([]byte, 8)
binary.LittleEndian.PutUint64(b, uint64(val))
encoded := base64.StdEncoding.EncodeToString([]byte(b))
fmt.Println(encoded, len(encoded))
// https://youtu.be/gocwRvLhDf8?t=75
ytid := strings.ReplaceAll(encoded, "+", "-")
ytid = strings.ReplaceAll(ytid, "/", "_")
fmt.Println("Youtube ID from 64 bit number:", ytid)
}
func randint64() (int64, error) {
val, err := rand.Int(rand.Reader, big.NewInt(int64(math.MaxInt64)))
if err != nil {
return 0, err
}
return val.Int64(), nil
}
But it has two issues:
The identifier is 12 characters instead of the expected 11
The encoded base64 suffix is "=" which means that it didn't have enough to encode?
So where am I going wrong?
tl;dr
An 8-byte int64 (no matter what value) will always encode to 11 base64 bytes followed by a single padded byte =, so you can reliably do this to get your 11 character YouTubeID:
var replacer = strings.NewReplacer(
"+", "-",
"/", "_",
)
ytid := replacer.Replace(encoded[:11])
or (H/T #Crowman & #Peter) one can encode without padding & without replacing + and / with base64.RawURLEncoding:
//encoded := base64.StdEncoding.EncodeToString(b) // may include + or /
ytid := base64.RawURLEncoding.EncodeToString(b) // produces URL-friendly - and _
https://play.golang.org/p/AjlvtfR7RWD
One byte (i.e. 8-bits) of Base64 output conveys 6-bits of input. So the formula to determine the number of output bytes given a certain inputs is:
out = in * 8 / 6
or
out = in * 4 / 3
With a devisor of 3 this will lead to partial use of output bytes in some cases. If the input bytes length is:
divisible by 3 - the final byte lands on a byte boundary
not divisible by 3 - the final byte is not on a byte-boundary and requires padding
In the case of 8 bytes of input:
out = 8 * 4 / 3 = 10 2/3
will utilize 10 fully utilized output base64 bytes - and one partial byte (for the 2/3) - so 11 base64 bytes plus padding to indicate how many wasted bits.
Padding is indicated via the = character and the number of = indicates the number of "wasted" bits:
waste padding
===== =======
0
1/3 =
2/3 ==
Since the output produces 10 2/3 used bytes - then 1/3 bytes were "wasted" so the padding is a single =
So base64 encoding 8 input bytes will always produce 11 base64 bytes followed by a single = padding character to produce 12 bytes in total.
= in base64 is padding, but in 64-bit numbers, this padding is extra and does not require 12 characters, but why?
see Encoding.Encode function source:
func (enc *Encoding) Encode(dst, src []byte) {
if len(src) == 0 {
return
}
// enc is a pointer receiver, so the use of enc.encode within the hot
// loop below means a nil check at every operation. Lift that nil check
// outside of the loop to speed up the encoder.
_ = enc.encode
di, si := 0, 0
n := (len(src) / 3) * 3
//https://golang.org/src/encoding/base64/base64.go
in this (len(src) / 3) * 3 part , used 3 instead of 6
so output of this function always is string with even length, if your input is always 64-bit, you can delete = after encoding and add it again for decoding.
for i := 8; i <= 18; i++ {
b := make([]byte, i)
binary.LittleEndian.PutUint64(b, uint64(0))
encoded := base64.StdEncoding.EncodeToString(b)
fmt.Println(encoded)
}
AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA
What do I mean by 6 (or 3)?
base64 use 64 character, each character map to one value (from 000000 to 111111)
example:
a 64bit value (uint64):
11154013587666973726
binary representation:
1001101011001011000001000100001011110000110001010011010000011110
split each six digit:
001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110
J, r, L, B, E, L, w, x, T, Q, e

How to convert an sha3 hash to an big integer in golang

I generated a hash value using sha3 and I need to convert it to a big.Int value. Is it possible ? or is there a method to get the integervalue of the hash ?
the following code throws an error that cannot convert type hash.Hash to type int64 :
package main
import (
"math/big"
"golang.org/x/crypto/sha3"
"fmt"
)
func main(){
chall := "hello word"
b := byte[](chall)
h := sha3.New244()
h.Write(chall)
h.Write(b)
d := make([]byte, 16)
h.Sum(d)
val := big.NewInt(int64(h))
fmt.Println(val)
}
TL;DR;
sha3.New224() cannot be represented in uint64 type.
There are many hash types - and of differing sizes. Go standard library picks a very generic interface to cover all type of hashes: https://golang.org/pkg/hash/#Hash
type Hash interface {
io.Writer
Sum(b []byte) []byte
Reset()
Size() int
BlockSize() int
}
Having said that some Go hash implementations optionally include extra methods like hash.Hash64:
type Hash64 interface {
Hash
Sum64() uint64
}
others may implement encoding.BinaryMarshaler:
type BinaryMarshaler interface {
MarshalBinary() (data []byte, err error)
}
which one can use to preserve a hash state.
sha3.New224() does not implement the above 2 interfaces, but crc64 hash does.
To do a runtime check:
h64, ok := h.(hash.Hash64)
if ok {
fmt.Printf("64-bit: %d\n", h64.Sum64())
}
Working example: https://play.golang.org/p/uLUfw0gMZka
(See Peter's comment for the simpler version of this.)
Interpreting a series of bytes as a big.Int is the same as interpreting a series of decimal digits as an arbitrarily large number. For example, to convert the digits 1234 into a "number", you'd do this:
Start with 0
Multiply by 10 = 0
Add 1 = 1
Multiply by 10 = 10
Add 2 = 12
Multiply by 10 = 120
Add 3 = 123
Multiply by 10 = 1230
Add 4 = 1234
The same applies to bytes. The "digits" are just base-256 rather than base-10:
val := big.NewInt(0)
for i := 0; i < h.Size(); i++ {
val.Lsh(val, 8)
val.Add(val, big.NewInt(int64(d[i])))
}
(Lsh is a left-shift. Left shifting by 8 bits is the same as multiplying by 256.)
Playground

binary.Write() byte ordering not working for []byte

package main
import (
"encoding/binary"
"fmt"
"bytes"
)
func main(){
b := new(bytes.Buffer)
c := new(bytes.Buffer)
binary.Write(b, binary.LittleEndian, []byte{0, 1})
binary.Write(b, binary.BigEndian, []byte{0, 1})
binary.Write(c, binary.LittleEndian, uint16(256))
binary.Write(c, binary.BigEndian, uint16(256))
fmt.Println(b.Bytes()) // [0 1 0 1]
fmt.Println(c.Bytes()) // [0 1 1 0]
}
It is very interesting, why binary.Write() byte ordering is working for uint8, uint16, uint64..etc, but []byte?
If []byte need to be ordered by binary.LittleEndian and write to bytes.Buffer, it needs to be reversed first? Is there any effective ways to solve this problem?
Thanks.
Only integer types get swapped by byte ordering.
When it's a slice of bytes, the binary package wouldn't really know what to swap.
For example, how would it know what to do if you passed 1k of data?
Treat it as int16, int32 or int64?
Or would you expect it to just reverse the whole slice?
Because there is nothing to order by. A byte is 8 bits, so you can go (unsigned) from 0 to 255. ie. uint16 you have 2 bytes, so you can order them differently.
ByteOrder is not even defined for int8. You can check the source code to see that binary.Write simply not uses the passed order when types are uint8 or int8. (A byte is an alias to uint8)

Size of a byte array golang

I have a []byte object and I want to get the size of it in bytes. Is there an equivalent to C's sizeof() in golang? If not, Can you suggest other ways to get the same?
To return the number of bytes in a byte slice use the len function:
bs := make([]byte, 1000)
sz := len(bs)
// sz == 1000
If you mean the number of bytes in the underlying array use cap instead:
bs := make([]byte, 1000, 2000)
sz := cap(bs)
// sz == 2000
A byte is guaranteed to be one byte: https://golang.org/ref/spec#Size_and_alignment_guarantees.
I think your best bet would be;
package main
import "fmt"
import "encoding/binary"
func main() {
thousandBytes := make([]byte, 1000)
tenBytes := make([]byte, 10)
fmt.Println(binary.Size(tenBytes))
fmt.Println(binary.Size(thousandBytes))
}
https://play.golang.org/p/HhJif66VwY
Though there are many options, like just importing unsafe and using sizeof;
import unsafe "unsafe"
size := unsafe.Sizeof(bytes)
Note that for some types, like slices, Sizeof is going to give you the size of the slice descriptor which is likely not what you want. Also, bear in mind the length and capacity of the slice are different and the value returned by binary.Size reflects the length.

Displayed size of Go string variable seems unreal

Please see the example: http://play.golang.org/p/6d4uX15EOQ
package main
import (
"fmt"
"reflect"
"unsafe"
)
func main() {
c := "foofoofoofoofoofofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo"
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c))
fmt.Printf("c: %T, %d\n", c, reflect.TypeOf(c).Size())
}
Output:
c: string, 8 //8 bytes?!
c: string, 8
It seems like so large string can not have so small size! What's going wrong?
Package unsafe
import "unsafe"
func Sizeof
func Sizeof(v ArbitraryType) uintptr
Sizeof returns the size in bytes occupied by the value v. The size is
that of the "top level" of the value only. For instance, if v is a
slice, it returns the size of the slice descriptor, not the size of
the memory referenced by the slice.
The Go Programming Language Specification
Length and capacity
len(s) string type string length in bytes
You are looking at the "top level", the string descriptor, a pointer to and the length of the underlying string value. Use the len function for the length, in bytes, of the underlying string value.
Conceptually and practically, the string descriptor is a struct containing a pointer and a length, whose lengths (32 or 64 bit) are implementation dependent. For example,
package main
import (
"fmt"
"unsafe"
)
type stringDescriptor struct {
str *byte
len int
}
func main() {
fmt.Println("string descriptor size in bytes:", unsafe.Sizeof(stringDescriptor{}))
}
Output (64 bit):
string descriptor size in bytes: 16
Output (32 bit):
string descriptor size in bytes: 8
A string is essentially a pointer the the data, and an int for the length; so on 32bit systems, it's 8 bytes, and 16 bytes on 64-bit systems.
Both unsafe.Sizeof and reflect.TypeOf(foo).Size() show the size of the string header (two words, IIRC). If you want to get the length of a string, use len(foo).
Playground: http://play.golang.org/p/hRw-EIVIQg.

Resources