How can I convert a zero-terminated byte array to string? - go

I need to read [100]byte to transfer a bunch of string data.
Because not all of the strings are precisely 100 characters long, the remaining part of the byte array is padded with 0s.
If I convert [100]byte to string by: string(byteArray[:]), the tailing 0s are displayed as ^#^#s.
In C, the string will terminate upon 0, so what's the best way to convert this byte array to string in Go?

Methods that read data into byte slices return the number of bytes read. You should save that number and then use it to create your string. If n is the number of bytes read, your code would look like this:
s := string(byteArray[:n])
To convert the full string, this can be used:
s := string(byteArray[:len(byteArray)])
This is equivalent to:
s := string(byteArray[:])
If for some reason you don't know n, you could use the bytes package to find it, assuming your input doesn't have a null character embedded in it.
n := bytes.Index(byteArray[:], []byte{0})
Or as icza pointed out, you can use the code below:
n := bytes.IndexByte(byteArray[:], 0)

Use:
s := string(byteArray[:])

Simplistic solution:
str := fmt.Sprintf("%s", byteArray)
I'm not sure how performant this is though.

For example,
package main
import "fmt"
func CToGoString(c []byte) string {
n := -1
for i, b := range c {
if b == 0 {
break
}
n = i
}
return string(c[:n+1])
}
func main() {
c := [100]byte{'a', 'b', 'c'}
fmt.Println("C: ", len(c), c[:4])
g := CToGoString(c[:])
fmt.Println("Go:", len(g), g)
}
Output:
C: 100 [97 98 99 0]
Go: 3 abc

The following code is looking for '\0', and under the assumptions of the question the array can be considered sorted since all non-'\0' precede all '\0'. This assumption won't hold if the array can contain '\0' within the data.
Find the location of the first zero-byte using a binary search, then slice.
You can find the zero-byte like this:
package main
import "fmt"
func FirstZero(b []byte) int {
min, max := 0, len(b)
for {
if min + 1 == max { return max }
mid := (min + max) / 2
if b[mid] == '\000' {
max = mid
} else {
min = mid
}
}
return len(b)
}
func main() {
b := []byte{1, 2, 3, 0, 0, 0}
fmt.Println(FirstZero(b))
}
It may be faster just to naively scan the byte array looking for the zero-byte, especially if most of your strings are short.

When you do not know the exact length of non-nil bytes in the array, you can trim it first:
string(bytes.Trim(arr, "\x00"))

Use this:
bytes.NewBuffer(byteArray).String()

Only use for performance tuning.
package main
import (
"fmt"
"reflect"
"unsafe"
)
func BytesToString(b []byte) string {
return *(*string)(unsafe.Pointer(&b))
}
func StringToBytes(s string) []byte {
return *(*[]byte)(unsafe.Pointer(&s))
}
func main() {
b := []byte{'b', 'y', 't', 'e'}
s := BytesToString(b)
fmt.Println(s)
b = StringToBytes(s)
fmt.Println(string(b))
}

Though not extremely performant, the only readable solution is:
// Split by separator and pick the first one.
// This has all the characters till null, excluding null itself.
retByteArray := bytes.Split(byteArray[:], []byte{0}) [0]
// OR
// If you want a true C-like string, including the null character
retByteArray := bytes.SplitAfter(byteArray[:], []byte{0}) [0]
A full example to have a C-style byte array:
package main
import (
"bytes"
"fmt"
)
func main() {
var byteArray = [6]byte{97,98,0,100,0,99}
cStyleString := bytes.SplitAfter(byteArray[:], []byte{0}) [0]
fmt.Println(cStyleString)
}
A full example to have a Go style string excluding the nulls:
package main
import (
"bytes"
"fmt"
)
func main() {
var byteArray = [6]byte{97, 98, 0, 100, 0, 99}
goStyleString := string(bytes.Split(byteArray[:], []byte{0}) [0])
fmt.Println(goStyleString)
}
This allocates a slice of slice of bytes. So keep an eye on performance if it is used heavily or repeatedly.

Use slices instead of arrays for reading. For example, io.Reader accepts a slice, not an array.
Use slicing instead of zero padding.
Example:
buf := make([]byte, 100)
n, err := myReader.Read(buf)
if n == 0 && err != nil {
log.Fatal(err)
}
consume(buf[:n]) // consume() will see an exact (not padded) slice of read data

Here is an option that removes the null bytes:
package main
import "golang.org/x/sys/windows"
func main() {
b := []byte{'M', 'a', 'r', 'c', 'h', 0}
s := windows.ByteSliceToString(b)
println(s == "March")
}
https://pkg.go.dev/golang.org/x/sys/unix#ByteSliceToString
https://pkg.go.dev/golang.org/x/sys/windows#ByteSliceToString

Related

Better to compare slices or bytes?

I'm just curious on which of these methods is better (or if there's an even better one that I'm missing). I'm trying to determine if the first letter and last letter of a word are the same, and there are two obvious solutions to me.
if word[:1] == word[len(word)-1:]
or
if word[0] == word[len(word)-1]
As I understand it, the first is just pulling slices of the string and doing a string comparison, while the second is pulling the character from either end and comparing as bytes.
I'm curious if there's a performance difference between the two, and if there's any "preferable" way to do this?
In Go, strings are UTF-8 encoded. UTF-8 is a variable-length encoding.
package main
import "fmt"
func main() {
word := "世界世"
fmt.Println(word[:1] == word[len(word)-1:])
fmt.Println(word[0] == word[len(word)-1])
}
Output:
false
false
If you really want to compare a byte, not a character, then be as precise as possible for the compiler. Obviously, compare a byte, not a slice.
BenchmarkSlice-4 200000000 7.55 ns/op
BenchmarkByte-4 2000000000 1.08 ns/op
package main
import "testing"
var word = "word"
func BenchmarkSlice(b *testing.B) {
for i := 0; i < b.N; i++ {
if word[:1] == word[len(word)-1:] {
}
}
}
func BenchmarkByte(b *testing.B) {
for i := 0; i < b.N; i++ {
if word[0] == word[len(word)-1] {
}
}
}
If by letter you mean rune, then use:
func eqRune(s string) bool {
if s == "" {
return false // or true if that makes more sense for the app
}
f, _ := utf8.DecodeRuneInString(s) // 2nd return value is rune size. ignore it.
l, _ := utf8.DecodeLastRuneInString(s) // 2nd return value is rune size. ignore it.
if f != l {
return false
}
if f == unicode.ReplacementChar {
// First and last are invalid UTF-8. Fallback to
// comparing bytes.
return s[0] == s[len(s)-1]
}
return true
}
If you mean byte, then use:
func eqByte(s string) bool {
if s == "" {
return false // or true if that makes more sense for the app
}
return s[0] == s[len(s)-1]
}
Comparing individual bytes is faster than comparing string slices as shown by the benchmark in another answer.
playground example
A string is a sequence of bytes. Your method works if you know the string contains only ASCII characters. Otherwise, you should use a method that handles multibyte characters instead of string indexing. You can convert it to a rune slice to process code points or characters, like this:
r := []rune(s)
return r[0] == r[len(r) - 1]
You can read more about strings, byte slices, runes, and code points in the official Go Blog post on the subject.
To answer your question, there's no significant performance difference between the two index expressions you posted.
Here's a runnable example:
package main
import "fmt"
func EndsMatch(s string) bool {
r := []rune(s)
return r[0] == r[len(r) - 1]
}
func main() {
tests := []struct{
s string
e bool
}{
{"foo", false},
{"eve", true},
{"世界世", true},
}
for _, t := range tests {
r := EndsMatch(t.s)
if r != t.e {
fmt.Printf("EndsMatch(%s) failed: expected %t, got %t\n", t.s, t.e, r)
}
}
}
Prints nothing.

How to use fmt.Sscan to parse integers into an array?

I'm trying to scan a list of integers from a string into an array (or alternatively, a slice)
package main
import "fmt"
func main() {
var nums [5]int
n, _ := fmt.Sscan("1 2 3 4 5", &nums) // doesn't work
fmt.Println(nums)
}
What do I need to pass as second argument to Sscan in order for this to work?
I know I could pass nums[0], nums[1] ... etc., but I'd prefer a single argument.
I don't think this is possible as a convenient one-liner. As Sscan takes ...interface{}, you would need to pass slice of interfaces as well, hence converting your array first:
func main() {
var nums [5]int
// Convert to interfaces
xnums := make([]interface{}, len(nums))
for n := range nums {
xnums[n] = &nums[n]
}
n, err := fmt.Sscan("1 2 3 4 5", xnums...)
if err != nil {
fmt.Printf("field %d: %s\n", n+1, err)
}
fmt.Println(nums)
}
http://play.golang.org/p/1X28J7JJwl
Obviously you could mix different types in your interface array, so it would make the scanning of more complex string easier. For simply space-limited integers, you might be better using strings.Split or bufio.Scanner along with strconv.Atoi.
To allow this to work on more than just hard-coded strings, it's probably better to use a bufio.Scanner, and an io.Reader interface to do this:
package main
import (
"bufio"
"fmt"
"io"
"strconv"
"strings"
)
func scanInts(r io.Reader) ([]int, error) {
s := bufio.NewScanner(r)
s.Split(bufio.ScanWords)
var ints []int
for s.Scan() {
i, err := strconv.Atoi(s.Text())
if err != nil {
return ints, err
}
ints = append(ints, i)
}
return ints, s.Err()
}
func main() {
input := "1 2 3 4 5"
ints, err := scanInts(strings.NewReader(input))
if err != nil {
fmt.Println(err)
}
fmt.Println(ints)
}
Produces:
[1 2 3 4 5]
Playground
Unless you're trying to use Sscann specifically you can also try this as an alternative:
split the input string by spaces
iterate the resulting array
convert each string into an int
store the resulting value into an int slice
Like this:
package main
import (
"fmt"
"strconv"
"strings"
)
func main() {
nums := make([]int, 0)
for _, s := range strings.Split("1 2 3 4 5", " ") {
i, e := strconv.Atoi(s)
if e != nil {
i = 0 // that was not a number, default to 0
}
nums = append(nums, i)
}
fmt.Println(nums)
}
http://play.golang.org/p/rCZl46Ixd4

Golang: Convert byte array to big.Int

I'm trying to create an RSA Public Key from a Modulus and Exponent stored in a byte array. After some experimentation I've got the following:
func bytes_to_int(b []byte) (acc uint64) {
length := len(b)
if length % 4 != 0 {
extra := (4 - length % 4)
b = append([]byte(strings.Repeat("\000", extra)), b...)
length += extra
}
var block uint32
for i := 0; i < length; i += 4 {
block = binary.BigEndian.Uint32(b[i:i+4])
acc = (acc << 32) + uint64(block)
}
return
}
func main() {
fmt.Println(bytes_to_int(data[:128]))
fmt.Println(bytes_to_int(data[128:]))
}
This appears to work (although I'm not convinced there isn't a better way). My next step was to convert it to use math/big in order to handle larger numbers. I can see an Lsh function to do the << but can't figure out how to recursively add the Uint32(block) to the big.Int.
For reference, the Public Key I'm attempting to import is a Mixmaster Key stored in a keyring (pubring.mix):
http://www.mixmin.net/draft-sassaman-mixmaster-XX.html#key-format
http://pinger.mixmin.net/pubring.mix
You want Int.SetBytes to make a big.int from a slice of []byte.
func (z *Int) SetBytes(buf []byte) *Int
SetBytes interprets buf as the bytes of a big-endian unsigned integer, sets z to that value, and returns z.
This should be quite straightforward to use in your application since your keys are in big-endian format according to the doc you linked.
import "math/big"
z := new(big.Int)
z.SetBytes(byteSliceHere)
Like Nick mentioned, you could use SetBytes, keep in mind the input is in base64 so you have to decode that first.
Example:
func Base64ToInt(s string) (*big.Int, error) {
data, err := base64.StdEncoding.DecodeString(s)
if err != nil {
return nil, err
}
i := new(big.Int)
i.SetBytes(data)
return i, nil
}

The binary representation of unsigned integer in Go

Is there a built-in function to convert a uint to a slice of binary integers {0,1} ?
>> convert_to_binary(2)
[1, 0]
I am not aware of such a function, however you can use strconv.FormatUint for that purpose.
Example (on play):
func Bits(i uint64) []byte {
bits := []byte{}
for _, b := range strconv.FormatUint(i, 2) {
bits = append(bits, byte(b - rune('0')))
}
return bits
}
FormatUint will return the string representation of the given uint to a base, in this case 2, so we're encoding it in binary. So the returned string for i=2 looks like this: "10". In bytes this is [49 48] as 1 is 49 and 0 is 48 in ASCII and Unicode. So we just need to iterate over the string, subtracting 48 from each rune (unicode character) and converting it to a byte.
Here is another method:
package main
import (
"bytes"
"fmt"
"math/bits"
)
func unsigned(x uint) []byte {
b := make([]byte, bits.UintSize)
for i := range b {
if bits.LeadingZeros(x) == 0 {
b[i] = 1
}
x = bits.RotateLeft(x, 1)
}
return b
}
func trimUnsigned(x uint) []byte {
return bytes.TrimLeft(unsigned(x), string(0))
}
func main() {
b := trimUnsigned(2)
fmt.Println(b) // [1 0]
}
https://golang.org/pkg/math/bits#LeadingZeros

Convert []string to []byte

I am looking to convert a string array to a byte array in GO so I can write it down to a disk. What is an optimal solution to encode and decode a string array ([]string) to a byte array ([]byte)?
I was thinking of iterating the string array twice, first one to get the actual size needed for the byte array and then a second one to write the length and actual string ([]byte(str)) for each element.
The solution must be able to convert it the other-way; from a []byte to a []string.
Lets ignore the fact that this is Go for a second. The first thing you need is a serialization format to marshal the []string into.
There are many option here. You could build your own or use a library. I am going to assume you don't want to build your own and jump to serialization formats go supports.
In all examples, data is the []string and fp is the file you are reading/writing to. Errors are being ignored, check the returns of functions to handle errors.
Gob
Gob is a go only binary format. It should be relatively space efficient as the number of strings increases.
enc := gob.NewEncoder(fp)
enc.Encode(data)
Reading is also simple
var data []string
dec := gob.NewDecoder(fp)
dec.Decode(&data)
Gob is simple and to the point. However, the format is only readable with other Go code.
Json
Next is json. Json is a format used just about everywhere. This format is just as easy to use.
enc := json.NewEncoder(fp)
enc.Encode(data)
And for reading:
var data []string
dec := json.NewDecoder(fp)
dec.Decode(&data)
XML
XML is another common format. However, it has pretty high overhead and not as easy to use. While you could just do the same you did for gob and json, proper xml requires a root tag. In this case, we are using the root tag "Strings" and each string is wrapped in an "S" tag.
type Strings struct {
S []string
}
enc := xml.NewEncoder(fp)
enc.Encode(Strings{data})
var x Strings
dec := xml.NewDecoder(fp)
dec.Decode(&x)
data := x.S
CSV
CSV is different from the others. You have two options, use one record with n rows or n records with 1 row. The following example uses n records. It would be boring if I used one record. It would look too much like the others. CSV can ONLY hold strings.
enc := csv.NewWriter(fp)
for _, v := range data {
enc.Write([]string{v})
}
enc.Flush()
To read:
var err error
var data string
dec := csv.NewReader(fp)
for err == nil { // reading ends when an error is reached (perhaps io.EOF)
var s []string
s, err = dec.Read()
if len(s) > 0 {
data = append(data, s[0])
}
}
Which format you use is a matter of preference. There are many other possible encodings that I have not mentioned. For example, there is an external library called bencode. I don't personally like bencode, but it works. It is the same encoding used by bittorrent metadata files.
If you want to make your own encoding, encoding/binary is a good place to start. That would allow you to make the most compact file possible, but I hardly thing it is worth the effort.
The gob package will do this for you http://godoc.org/encoding/gob
Example to play with http://play.golang.org/p/e0FEZm-qiS
same source code is below.
package main
import (
"bytes"
"encoding/gob"
"fmt"
)
func main() {
// store to byte array
strs := []string{"foo", "bar"}
buf := &bytes.Buffer{}
gob.NewEncoder(buf).Encode(strs)
bs := buf.Bytes()
fmt.Printf("%q", bs)
// Decode it back
strs2 := []string{}
gob.NewDecoder(buf).Decode(&strs2)
fmt.Printf("%v", strs2)
}
to convert []string to []byte
var str = []string{"str1","str2"}
var x = []byte{}
for i:=0; i<len(str); i++{
b := []byte(str[i])
for j:=0; j<len(b); j++{
x = append(x,b[j])
}
}
to convert []byte to string
str := ""
var x = []byte{'c','a','t'}
for i := 0; i < len(x); i++ {
str += string(x[i])
}
To illustrate the problem, convert []string to []byte and then convert []byte back to []string, here's a simple solution:
package main
import (
"encoding/binary"
"fmt"
)
const maxInt32 = 1<<(32-1) - 1
func writeLen(b []byte, l int) []byte {
if 0 > l || l > maxInt32 {
panic("writeLen: invalid length")
}
var lb [4]byte
binary.BigEndian.PutUint32(lb[:], uint32(l))
return append(b, lb[:]...)
}
func readLen(b []byte) ([]byte, int) {
if len(b) < 4 {
panic("readLen: invalid length")
}
l := binary.BigEndian.Uint32(b)
if l > maxInt32 {
panic("readLen: invalid length")
}
return b[4:], int(l)
}
func Decode(b []byte) []string {
b, ls := readLen(b)
s := make([]string, ls)
for i := range s {
b, ls = readLen(b)
s[i] = string(b[:ls])
b = b[ls:]
}
return s
}
func Encode(s []string) []byte {
var b []byte
b = writeLen(b, len(s))
for _, ss := range s {
b = writeLen(b, len(ss))
b = append(b, ss...)
}
return b
}
func codecEqual(s []string) bool {
return fmt.Sprint(s) == fmt.Sprint(Decode(Encode(s)))
}
func main() {
var s []string
fmt.Println("equal", codecEqual(s))
s = []string{"", "a", "bc"}
e := Encode(s)
d := Decode(e)
fmt.Println("s", len(s), s)
fmt.Println("e", len(e), e)
fmt.Println("d", len(d), d)
fmt.Println("equal", codecEqual(s))
}
Output:
equal true
s 3 [ a bc]
e 19 [0 0 0 3 0 0 0 0 0 0 0 1 97 0 0 0 2 98 99]
d 3 [ a bc]
equal true
I would suggest to use PutUvarint and Uvarint for storing/retrieving len(s) and using []byte(str) to pass str to some io.Writer. With a string length known from Uvarint, one can buf := make([]byte, n) and pass the buf to some io.Reader.
Prepend the whole thing with length of the string array and repeat the above for all of its items. Reading the whole thing back is again reading first the outer length and repeating n-times the item read.
You can do something like this:
var lines = []string
var ctx = []byte{}
for _, s := range lines {
ctx = append(ctx, []byte(s)...)
}
It can be done easily using strings package. First you need to convert the slice of string to a string.
func Join(elems []string, sep string) string
You need to pass the slice of strings and the separator you need to separate the elements in the string. (examples: space or comma)
Then you can easily convert the string to a slice of bytes by type conversion.
package main
import (
"fmt"
"strings"
)
func main() {
//Slice of Strings
sliceStr := []string{"a","b","c","d"}
fmt.Println(sliceStr) //prints [a b c d]
//Converting slice of String to String
str := strings.Join(sliceStr,"")
fmt.Println(str) // prints abcd
//Converting String to slice of Bytes
sliceByte := []byte(str) //prints [97 98 99 100]
fmt.Println(sliceByte)
//Converting slice of bytes a String
str2 := string(sliceByte)
fmt.Println(str2) // prints abcd
//Converting string to a slice of Strings
sliceStr2 := strings.Split(str2,"")
fmt.Println(sliceStr2) //prints [a b c d]
}

Resources