I'm assuming all I need to do is encode 2^64 as base64 to get a 11 character Youtube identifier. I created a Go program https://play.golang.org/p/2nuA3JxVMd0
package main
import (
"crypto/rand"
"encoding/base64"
"encoding/binary"
"fmt"
"math"
"math/big"
"strings"
)
func main() {
// For example Youtube uses 11 characters of base64.
// How many base64 characters would it require to express a 2^64 number? 2^6^x = 2^64 .. so x = 64/6 = 10.666666666 … i.e. eleven rounded up.
// Generate a 64 bit number
val, _ := randint64()
fmt.Println(val)
// Encode the 64 bit number
b := make([]byte, 8)
binary.LittleEndian.PutUint64(b, uint64(val))
encoded := base64.StdEncoding.EncodeToString([]byte(b))
fmt.Println(encoded, len(encoded))
// https://youtu.be/gocwRvLhDf8?t=75
ytid := strings.ReplaceAll(encoded, "+", "-")
ytid = strings.ReplaceAll(ytid, "/", "_")
fmt.Println("Youtube ID from 64 bit number:", ytid)
}
func randint64() (int64, error) {
val, err := rand.Int(rand.Reader, big.NewInt(int64(math.MaxInt64)))
if err != nil {
return 0, err
}
return val.Int64(), nil
}
But it has two issues:
The identifier is 12 characters instead of the expected 11
The encoded base64 suffix is "=" which means that it didn't have enough to encode?
So where am I going wrong?
tl;dr
An 8-byte int64 (no matter what value) will always encode to 11 base64 bytes followed by a single padded byte =, so you can reliably do this to get your 11 character YouTubeID:
var replacer = strings.NewReplacer(
"+", "-",
"/", "_",
)
ytid := replacer.Replace(encoded[:11])
or (H/T #Crowman & #Peter) one can encode without padding & without replacing + and / with base64.RawURLEncoding:
//encoded := base64.StdEncoding.EncodeToString(b) // may include + or /
ytid := base64.RawURLEncoding.EncodeToString(b) // produces URL-friendly - and _
https://play.golang.org/p/AjlvtfR7RWD
One byte (i.e. 8-bits) of Base64 output conveys 6-bits of input. So the formula to determine the number of output bytes given a certain inputs is:
out = in * 8 / 6
or
out = in * 4 / 3
With a devisor of 3 this will lead to partial use of output bytes in some cases. If the input bytes length is:
divisible by 3 - the final byte lands on a byte boundary
not divisible by 3 - the final byte is not on a byte-boundary and requires padding
In the case of 8 bytes of input:
out = 8 * 4 / 3 = 10 2/3
will utilize 10 fully utilized output base64 bytes - and one partial byte (for the 2/3) - so 11 base64 bytes plus padding to indicate how many wasted bits.
Padding is indicated via the = character and the number of = indicates the number of "wasted" bits:
waste padding
===== =======
0
1/3 =
2/3 ==
Since the output produces 10 2/3 used bytes - then 1/3 bytes were "wasted" so the padding is a single =
So base64 encoding 8 input bytes will always produce 11 base64 bytes followed by a single = padding character to produce 12 bytes in total.
= in base64 is padding, but in 64-bit numbers, this padding is extra and does not require 12 characters, but why?
see Encoding.Encode function source:
func (enc *Encoding) Encode(dst, src []byte) {
if len(src) == 0 {
return
}
// enc is a pointer receiver, so the use of enc.encode within the hot
// loop below means a nil check at every operation. Lift that nil check
// outside of the loop to speed up the encoder.
_ = enc.encode
di, si := 0, 0
n := (len(src) / 3) * 3
//https://golang.org/src/encoding/base64/base64.go
in this (len(src) / 3) * 3 part , used 3 instead of 6
so output of this function always is string with even length, if your input is always 64-bit, you can delete = after encoding and add it again for decoding.
for i := 8; i <= 18; i++ {
b := make([]byte, i)
binary.LittleEndian.PutUint64(b, uint64(0))
encoded := base64.StdEncoding.EncodeToString(b)
fmt.Println(encoded)
}
AAAAAAAAAAA=
AAAAAAAAAAAA
AAAAAAAAAAAAAA==
AAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAA==
AAAAAAAAAAAAAAAAAAAAAAA=
AAAAAAAAAAAAAAAAAAAAAAAA
What do I mean by 6 (or 3)?
base64 use 64 character, each character map to one value (from 000000 to 111111)
example:
a 64bit value (uint64):
11154013587666973726
binary representation:
1001101011001011000001000100001011110000110001010011010000011110
split each six digit:
001001,101011,001011,000001,000100,001011,110000,110001,010011,010000,011110
J, r, L, B, E, L, w, x, T, Q, e
Related
I need to encode integer keys as byte slices for a KV database.
I want to make the encoding smaller and cut the zero padding.
I thought the variant encoding from the binary package would be the way to go.
But in both cases, variant and fixed, the byte slice length is the same.
Just different bits arrangement since first bit is used as a flag.
I assumed the variant encoding would cut the "extra fat". No.
package main
import (
"encoding/binary"
"fmt"
)
func main() {
x := 16
y := 106547
fmt.Println(x)
fmt.Println(y)
// Variant
bvx := make([]byte, 8)
bvy := make([]byte, 8)
xbts := binary.PutUvarint(bvx, uint64(x))
ybts := binary.PutUvarint(bvy, uint64(y))
fmt.Println("Variant bytes written x: ", xbts)
fmt.Println("Variant bytes written y: ", ybts)
fmt.Println(bvx)
fmt.Println(bvy)
fmt.Println("bvx length: ", len(bvx))
fmt.Println("bvy length: ", len(bvy))
// Fixed
bfx := make([]byte, 8)
bfy := make([]byte, 8)
binary.LittleEndian.PutUint64(bfx, uint64(x))
binary.LittleEndian.PutUint64(bfy, uint64(y))
fmt.Println(bfx)
fmt.Println(bfy)
fmt.Println("bfx length: ", len(bfx))
fmt.Println("bfy length: ", len(bfy))
}
My question is. Do I have to splice the byte slice manually with variant encoding to get rid of the extra bytes?
Since put PutUvariant returns the number of bytes written, I can just splice the byte slice.
Is this the right way to do it?
If not, what is the correct way to make the slices smaller?
Thanks
Package binary
import "encoding/binary"
func PutUvarint
func PutUvarint(buf []byte, x uint64) int
PutUvarint encodes a uint64 into buf and returns the number of bytes
written. If the buffer is too small, PutUvarint will panic.
Fix your code:
bvx := make([]byte, binary.MaxVarintLen64)
bvy := make([]byte, binary.MaxVarintLen64)
bvx = bvx[:binary.PutUvarint(bvx[:cap(bvx)], uint64(x))]
bvy = bvy[:binary.PutUvarint(bvy[:cap(bvy)], uint64(y))]
package main
import (
"encoding/binary"
"fmt"
)
func main() {
x := 16
y := 106547
fmt.Println(x)
fmt.Println(y)
// Variant
bvx := make([]byte, binary.MaxVarintLen64)
bvy := make([]byte, binary.MaxVarintLen64)
bvx = bvx[:binary.PutUvarint(bvx[:cap(bvx)], uint64(x))]
bvy = bvy[:binary.PutUvarint(bvy[:cap(bvy)], uint64(y))]
fmt.Println("Variant bytes written x: ", len(bvx))
fmt.Println("Variant bytes written y: ", len(bvy))
fmt.Println(bvx)
fmt.Println(bvy)
fmt.Println("bvx length: ", len(bvx))
fmt.Println("bvy length: ", len(bvy))
// Fixed
bfx := make([]byte, 8)
bfy := make([]byte, 8)
binary.LittleEndian.PutUint64(bfx, uint64(x))
binary.LittleEndian.PutUint64(bfy, uint64(y))
fmt.Println(bfx)
fmt.Println(bfy)
fmt.Println("bfx length: ", len(bfx))
fmt.Println("bfy length: ", len(bfy))
}
Playground: https://play.golang.org/p/XN46KafMY23
Output:
16
106547
Variant bytes written x: 1
Variant bytes written y: 3
[16]
[179 192 6]
bvx length: 1
bvy length: 3
[16 0 0 0 0 0 0 0]
[51 160 1 0 0 0 0 0]
bfx length: 8
bfy length: 8
I have a file containing two bytes, in Big Endian order, hexdump gives me:
81 50
which is 1000 0001 0101 0000 in binary. However, I want the most significant bit to be a flag, so in golang I have to load the file content, clear the most significant bit, and only then read the value.
So:
valueBuf := make([]byte, 2)
_, err := f.Read(valueBuf) // now printing valueBuf gives me [129 80] in decimal
value := int16(binary.BigEndian.Uint16(valueBuf[0:2])) // now value is -32432
Ok, I have tried to use something like:
func clearBit(n int16, pos uint) int16 {
mask := ^(1 << pos)
n &= mask
return n
}
But it apparently doesn't work as expected. The output value should be 336 in decimal, as normal int, and I cannot get it. How should I do this?
for n &= mask to work, n and mask have to be matching types. So you should write
mask := int16(^(1 << pos))
then, value = clearBit(value, 15) works fine.
Or, since constants are untyped, you can eliminate mask, and also eliminate the assignment to n since it's just returned on the following line, and shorten clearBit to
func clearBit(n int16, pos uint) int16 {
return n & ^(1 << pos)
}
I am trying to read a user input with bufio in console. The text can have some special characters (é, à, ♫, ╬,...).
The code look like this :
reader := bufio.NewReader(os.Stdin)
input, _ := reader.ReadString('\n')
If I type for example "é", the ReadString will read it as "c3 a9" instead of "00e9". How can I read the text input in Unicode instead of UTF-8 ? I need to use this value as a hash table key.
Thanks
Go strings are conceptually a read-only slice to a read-only bytearray. The encoding of that bytearray is not specified, but string constants will be UTF-8 and using UTF-8 in other strings is the recommended approach.
Go provides convenience functions for accessing the UTF-8 as unicode codepoints (or runes in go-speak). A range loop over a string will do the utf8 decoding for you. Converting to []rune will give you a rune slice i.e. the unicode codepoints in order. These goodies only work on UTF-8 encoded strings/bytearrays. I would strongly suggest using UTF-8 internally.
An example:
package main
import (
"bufio"
"fmt"
"os"
)
func main() {
reader := bufio.NewReader(os.Stdin)
input, _ := reader.ReadString('\n')
println("non-range loop - bytes")
for i := 0; i < len(input); i++ {
fmt.Printf("%d %d %[2]x\n", i, input[i])
}
println("range-loop - runes")
for idx, r := range input {
fmt.Printf("%d %d %[2]c\n", idx, r)
}
println("converted to rune slice")
rs := []rune(input)
fmt.Printf("%#v\n", rs)
}
With the input: X é X
non-range loop - bytes
0 88 58
1 32 20
2 195 c3
3 169 a9
4 32 20
5 88 58
6 10 a
range-loop - runes
0 88 X
1 32
2 233 é
4 32
5 88 X
6 10
converted to rune slice
[]int32{88, 32, 233, 32, 88, 10}
Unicode and utf8 are not comparable. String can be both unicode and utf8. I learned a lot of stuff about those by reading Strings, bytes, runes and characters in Go.
To answer your question,
You can use DecodeRuneInString from unicode/utf8 package.
s := "é"
rune, _ := utf8.DecodeRuneInString(s)
fmt.Printf("%x", rune)
What DecodeRuneInString(s) does is, it returns the first utf8 encoded character (rune) in s along with that characters width in bytes. So if you want to get unicode code points of each rune in a string heres how to do it. This is the example given in the linked documentation only slightly modified.
str := "Hello, 世界"
for len(str) > 0 {
r, size := utf8.DecodeRuneInString(str)
fmt.Printf("%x %v\n", r, size)
str = str[size:]
}
Try in Playground.
Alternatively as Juergen points out you can use a range loop on the string to get runes contained in the string.
str := "Hello, 世界"
for _, rune := range(str) {
fmt.Printf("%x \n", rune)
}
Try in Playground
I have to do a cryptography project for my school and I choose Go for this project !
I read the doc but I only C, so it's kinda hard for me right now.
First , I needed to collect the program arguments, I did it. I stockd all arguments in a string variable like :
var text, base string = os.Args[1], os. Args[6]
Now , i need to store the ASCII number in a array of int , for exemple , in C I would done something like that :
int arr[18];
char str[18] = "Hi Stack OverFlow";
arr[i] = str[i] - 96;
So how could I do that in Go?
Thanks !
Here's an example that is similar to the other answer but avoids importing additional packages.
Create a slice of int with the length equal to the string's length. Then iterate over the string to extract each character as int and assign it to the corresponding index in the int slice. Here's code (also on the Go Playground):
package main
import "fmt"
func main() {
s := "Hi Stack OverFlow"
fmt.Println(StringToInts(s))
}
// makes a slice of int and stores each char from string
// as int in the slice
func StringToInts(s string) (intSlice []int) {
intSlice = make([]int, len(s))
for i, _ := range s {
intSlice[i] = int(s[i])
}
return
}
Output of the above program is:
[72 105 32 83 116 97 99 107 32 79 118 101 114 70 108 111 119]
The StringToInts function in the above should do what you want. Though it returns a slice (not an array) of int, it should satisfy your usecase.
My guess is that you want something like this:
package main
import (
"fmt"
"strings"
)
// transform transforms ASCII letters to numbers.
// Letters in the English (basic Latin) alphabet, both upper and lower case,
// are represented by a number between one and twenty-six. All other characters,
// including space, are represented by the number zero.
func transform(s string) []int {
n := make([]int, 0, len(s))
other := 'a' - 1
for _, r := range strings.ToLower(s) {
if 'a' > r || r > 'z' {
r = other
}
n = append(n, int(r-other))
}
return n
}
func main() {
s := "Hi Stack OverFlow"
fmt.Println(s)
n := transform(s)
fmt.Println(n)
}
Output:
Hi Stack OverFlow
[8 9 0 19 20 1 3 11 0 15 22 5 18 6 12 15 23]
Take A Tour of Go and see if you can understand what the program does.
I'm trying to encode a large number to a list of bytes(uint8 in Go).
The number of bytes is unknown, so I'd like to use vector.
But Go doesn't provide vector of byte, what can I do?
And is it possible to get a slice of such a byte vector?
I intends to implement data compression.
Instead of store small and large number with the same number of bytes,
I'm implements a variable bytes that uses less bytes with small number
and more bytes with large number.
My code can not compile, invalid type assertion:
1 package main
2
3 import (
4 //"fmt"
5 "container/vector"
6 )
7
8 func vbEncodeNumber(n uint) []byte{
9 bytes := new(vector.Vector)
10 for {
11 bytes.Push(n % 128)
12 if n < 128 {
13 break
14 }
15 n /= 128
16 }
17 bytes.Set(bytes.Len()-1, bytes.Last().(byte)+byte(128))
18 return bytes.Data().([]byte) // <-
19 }
20
21 func main() { vbEncodeNumber(10000) }
I wish to writes a lot of such code into binary file,
so I wish the func can return byte array.
I haven't find a code example on vector.
Since you're trying to represent large numbers, you might see if the big package serves your purposes.
The general Vector struct can be used to store bytes. It accepts an empty interface as its type, and any other type satisfies that interface. You can retrieve a slice of interfaces through the Data method, but there's no way to convert that to a slice of bytes without copying it. You can't use type assertion to turn a slice of interface{} into a slice of something else. You'd have to do something like the following at the end of your function: (I haven't tried compiling this code because I can't right now)
byteSlice = make([]byte, bytes.Len())
for i, _ := range byteSlice {
byteSlice[i] = bytes.At(i).(byte)
}
return byteSlice
Take a look at the bytes package and the Buffer type there. You can write your ints as bytes into the buffer and then you can use the Bytes() method to access byte slices of the buffer.
I've found the vectors to be a lot less useful since the generic append and copy were added to the language. Here's how I'd do it in one shot with less copying:
package main
import "fmt"
func vbEncodeNumber(n uint) []byte {
bytes := make([]byte, 0, 4)
for n > 0 {
bytes = append(bytes, byte(n%256))
n >>= 8
}
return bytes
}
func main() {
bytes := vbEncodeNumber(10000)
for i := len(bytes)-1; i >= 0 ; i-- {
fmt.Printf("%02x ", bytes[i])
}
fmt.Println("")
}