Slicing an array and string - go

For below code:
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
// Declare a string with both chinese and english characters
s := "世界 means world"
// UTFMax is 4 -- up to 4 bytes per encoded rune
var buf [utf8.UTFMax]byte
fmt.Println(len(buf))
fmt.Println(cap(buf))
// Iterate over the string
for i, r := range s {
// Capture the number of bytes for this rune
rl := utf8.RuneLen(r)
fmt.Println("Rune lenght is:", rl)
// Calculate the slice offset for the bytes associated
// with this rune
si := i + rl
fmt.Println("Index is:", i)
fmt.Println("Slice offset:", si)
//Copy of rune from the string to buffer
copy(buf[:], s[i:si])
// Display the details
fmt.Printf("%2d: %q; codepoint: %#6x; encoded bytes: %#v\n", i, r, r, buf[:rl])
}
}
Built-in functions len & cap give the value for array type [4]byte
var buf [utf8.UTFMax]byte
fmt.Println(len(buf))
fmt.Println(cap(buf))
....
copy(buf[:], s[i:si])
Q1) Does buf[:] create a new slice header that points to array storage buf?
Q2) Is len(buf) performing implicit conversion of buf from array type to slice type?
Q3) Does s[i:si] create a new slice header that points to string s?

buf[:] is a slice whose cap and len are the length of the array, and the backing store is the array buf.
len(buf) gets the length of the array. It does not convert buf to a slice to do that. Length of buf is fixed at compile time.
s[i:si] creates a new string, not a new slice.

Related

Why range and subscription of a string produce different types?

import (
"fmt"
"reflect"
)
func main() {
s := "hello" // Same results with s := "世界"
for _, x := range s {
kx := reflect.ValueOf(x).Kind()
fmt.Printf("Type of x is %v\n", kx)
break
}
y := s[0]
ky := reflect.ValueOf(y).Kind()
fmt.Printf("Type of y is %v\n", ky)
}
// Type of x is int32
// Type of y is uint8
I was surprised to learn that I would get a different type if I use string subscription versus getting it via range.
Edit: I just realized that even s is a Unicode string, the type of y is always byte. This also means indexing into a string is unsafe unless it's an ASCII string.
For statements with range clause: (Link)
For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.
Now let's look at the types: (Link)
// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8
// rune is an alias for int32 and is equivalent to int32 in all ways. It is
// used, by convention, to distinguish character values from integer values.
type rune = int32
So this explains why int32 is for a rune, and uint8 is for a byte.
Here's some code to make the point clear. I've added some code and changed the
string to make it better. I hope the comments are self-explanatory. Also, I'd recommend reading: https://blog.golang.org/strings as well.
package main
import (
"fmt"
"reflect"
)
func main() {
// Changed the string for better understanding
// Each character is not of single byte
s := "日本語"
// Range over the string, where x is a rune
for _, x := range s {
kx := reflect.ValueOf(x).Kind()
fmt.Printf(
"Type of x is %v (%c)\n",
kx,
x, // Expected (rune)
)
break
}
// Indexing (First byte of the string)
y := s[0]
ky := reflect.ValueOf(y).Kind()
fmt.Printf(
"Type of y is %v (%c)\n",
ky,
y,
/*
Uh-oh, not expected. We are getting just the first byte
of a string and not the full multi-byte character.
But we need '日' (3 byte character).
*/
)
// Indexing (First rune of the string)
z := []rune(s)[0]
kz := reflect.ValueOf(z).Kind()
fmt.Printf(
"Type of z is %v (%c)\n",
kz,
z, // Expected (rune)
)
}
Sample output:
Type of x is int32 (日)
Type of y is uint8 (æ)
Type of z is int32 (日)
Note: In case your terminal is not showing the same output; there might be some issue with character encoding settings. So, changing that might help.

Size of a byte array golang

I have a []byte object and I want to get the size of it in bytes. Is there an equivalent to C's sizeof() in golang? If not, Can you suggest other ways to get the same?
To return the number of bytes in a byte slice use the len function:
bs := make([]byte, 1000)
sz := len(bs)
// sz == 1000
If you mean the number of bytes in the underlying array use cap instead:
bs := make([]byte, 1000, 2000)
sz := cap(bs)
// sz == 2000
A byte is guaranteed to be one byte: https://golang.org/ref/spec#Size_and_alignment_guarantees.
I think your best bet would be;
package main
import "fmt"
import "encoding/binary"
func main() {
thousandBytes := make([]byte, 1000)
tenBytes := make([]byte, 10)
fmt.Println(binary.Size(tenBytes))
fmt.Println(binary.Size(thousandBytes))
}
https://play.golang.org/p/HhJif66VwY
Though there are many options, like just importing unsafe and using sizeof;
import unsafe "unsafe"
size := unsafe.Sizeof(bytes)
Note that for some types, like slices, Sizeof is going to give you the size of the slice descriptor which is likely not what you want. Also, bear in mind the length and capacity of the slice are different and the value returned by binary.Size reflects the length.

Append a byte to a string?

How do you append a byte to a string in Go?
var ret string
var b byte
ret += b
invalid operation: ret += b (mismatched types string and byte)
In addition to ThunderCats answer.. you could initialize a bytes.Buffer from a string ... allowing you to continue appending bytes as you see fit:
buff := bytes.NewBufferString(ret)
// maybe buff.Grow(n) .. if you hit perf issues?
buff.WriteByte(b)
buff.WriteByte(b)
// ...
result := buff.String()
Here are a few options:
// append byte as slice
ret += string([]byte{b})
// append byte as rune
ret += string(rune(b))
// convert string to byte slice, append byte to slice, convert back to string
ret = string(append([]byte(ret), b))
Benchmark to see which one is best.
If you want to append more than one byte, then break the second option into multiple statements and append to the []byte:
buf := []byte(ret) // convert string to byte slice
buf = append(buf, b) // append byte to slice
buf = append(buf, b1) // append byte to slice
... etc
ret = string(buf) // convert back to string
If you want to append the rune r, then it's a little simpler:
ret += string(r)
Strings are immutable. The code above creates a new string that is a concatenation of the original string and a byte or rune.
It's a lot simpler than either of the other answers:
var ret string = "test"
var b byte = 'a'
ret += string(b)
// returns "testa"
That is, you can just cast an integer to a string and it will treat the integer as a rune (byte is an integer type). An then you can just concatenate the resulting string with +
Playground: https://play.golang.org/p/ktnUg70M-I
Another solution in Golang.
package main
import (
"fmt"
)
func main() {
byteArr := []byte{1,2,5,4}
stringStr := "Test"
output:= fmt.Sprintf("%v %v",byteArr,stringStr)
fmt.Println("Output: ",output)
}
Output
Output: [1 2 5 4] Test
You can also use strings.Builder:
package main
import "strings"
func main() {
var b strings.Builder
b.WriteByte('!')
println(b.String() == "!")
}
https://golang.org/pkg/strings#Builder.WriteByte

Golang: Convert byte array to big.Int

I'm trying to create an RSA Public Key from a Modulus and Exponent stored in a byte array. After some experimentation I've got the following:
func bytes_to_int(b []byte) (acc uint64) {
length := len(b)
if length % 4 != 0 {
extra := (4 - length % 4)
b = append([]byte(strings.Repeat("\000", extra)), b...)
length += extra
}
var block uint32
for i := 0; i < length; i += 4 {
block = binary.BigEndian.Uint32(b[i:i+4])
acc = (acc << 32) + uint64(block)
}
return
}
func main() {
fmt.Println(bytes_to_int(data[:128]))
fmt.Println(bytes_to_int(data[128:]))
}
This appears to work (although I'm not convinced there isn't a better way). My next step was to convert it to use math/big in order to handle larger numbers. I can see an Lsh function to do the << but can't figure out how to recursively add the Uint32(block) to the big.Int.
For reference, the Public Key I'm attempting to import is a Mixmaster Key stored in a keyring (pubring.mix):
http://www.mixmin.net/draft-sassaman-mixmaster-XX.html#key-format
http://pinger.mixmin.net/pubring.mix
You want Int.SetBytes to make a big.int from a slice of []byte.
func (z *Int) SetBytes(buf []byte) *Int
SetBytes interprets buf as the bytes of a big-endian unsigned integer, sets z to that value, and returns z.
This should be quite straightforward to use in your application since your keys are in big-endian format according to the doc you linked.
import "math/big"
z := new(big.Int)
z.SetBytes(byteSliceHere)
Like Nick mentioned, you could use SetBytes, keep in mind the input is in base64 so you have to decode that first.
Example:
func Base64ToInt(s string) (*big.Int, error) {
data, err := base64.StdEncoding.DecodeString(s)
if err != nil {
return nil, err
}
i := new(big.Int)
i.SetBytes(data)
return i, nil
}

Convert []string to []byte

I am looking to convert a string array to a byte array in GO so I can write it down to a disk. What is an optimal solution to encode and decode a string array ([]string) to a byte array ([]byte)?
I was thinking of iterating the string array twice, first one to get the actual size needed for the byte array and then a second one to write the length and actual string ([]byte(str)) for each element.
The solution must be able to convert it the other-way; from a []byte to a []string.
Lets ignore the fact that this is Go for a second. The first thing you need is a serialization format to marshal the []string into.
There are many option here. You could build your own or use a library. I am going to assume you don't want to build your own and jump to serialization formats go supports.
In all examples, data is the []string and fp is the file you are reading/writing to. Errors are being ignored, check the returns of functions to handle errors.
Gob
Gob is a go only binary format. It should be relatively space efficient as the number of strings increases.
enc := gob.NewEncoder(fp)
enc.Encode(data)
Reading is also simple
var data []string
dec := gob.NewDecoder(fp)
dec.Decode(&data)
Gob is simple and to the point. However, the format is only readable with other Go code.
Json
Next is json. Json is a format used just about everywhere. This format is just as easy to use.
enc := json.NewEncoder(fp)
enc.Encode(data)
And for reading:
var data []string
dec := json.NewDecoder(fp)
dec.Decode(&data)
XML
XML is another common format. However, it has pretty high overhead and not as easy to use. While you could just do the same you did for gob and json, proper xml requires a root tag. In this case, we are using the root tag "Strings" and each string is wrapped in an "S" tag.
type Strings struct {
S []string
}
enc := xml.NewEncoder(fp)
enc.Encode(Strings{data})
var x Strings
dec := xml.NewDecoder(fp)
dec.Decode(&x)
data := x.S
CSV
CSV is different from the others. You have two options, use one record with n rows or n records with 1 row. The following example uses n records. It would be boring if I used one record. It would look too much like the others. CSV can ONLY hold strings.
enc := csv.NewWriter(fp)
for _, v := range data {
enc.Write([]string{v})
}
enc.Flush()
To read:
var err error
var data string
dec := csv.NewReader(fp)
for err == nil { // reading ends when an error is reached (perhaps io.EOF)
var s []string
s, err = dec.Read()
if len(s) > 0 {
data = append(data, s[0])
}
}
Which format you use is a matter of preference. There are many other possible encodings that I have not mentioned. For example, there is an external library called bencode. I don't personally like bencode, but it works. It is the same encoding used by bittorrent metadata files.
If you want to make your own encoding, encoding/binary is a good place to start. That would allow you to make the most compact file possible, but I hardly thing it is worth the effort.
The gob package will do this for you http://godoc.org/encoding/gob
Example to play with http://play.golang.org/p/e0FEZm-qiS
same source code is below.
package main
import (
"bytes"
"encoding/gob"
"fmt"
)
func main() {
// store to byte array
strs := []string{"foo", "bar"}
buf := &bytes.Buffer{}
gob.NewEncoder(buf).Encode(strs)
bs := buf.Bytes()
fmt.Printf("%q", bs)
// Decode it back
strs2 := []string{}
gob.NewDecoder(buf).Decode(&strs2)
fmt.Printf("%v", strs2)
}
to convert []string to []byte
var str = []string{"str1","str2"}
var x = []byte{}
for i:=0; i<len(str); i++{
b := []byte(str[i])
for j:=0; j<len(b); j++{
x = append(x,b[j])
}
}
to convert []byte to string
str := ""
var x = []byte{'c','a','t'}
for i := 0; i < len(x); i++ {
str += string(x[i])
}
To illustrate the problem, convert []string to []byte and then convert []byte back to []string, here's a simple solution:
package main
import (
"encoding/binary"
"fmt"
)
const maxInt32 = 1<<(32-1) - 1
func writeLen(b []byte, l int) []byte {
if 0 > l || l > maxInt32 {
panic("writeLen: invalid length")
}
var lb [4]byte
binary.BigEndian.PutUint32(lb[:], uint32(l))
return append(b, lb[:]...)
}
func readLen(b []byte) ([]byte, int) {
if len(b) < 4 {
panic("readLen: invalid length")
}
l := binary.BigEndian.Uint32(b)
if l > maxInt32 {
panic("readLen: invalid length")
}
return b[4:], int(l)
}
func Decode(b []byte) []string {
b, ls := readLen(b)
s := make([]string, ls)
for i := range s {
b, ls = readLen(b)
s[i] = string(b[:ls])
b = b[ls:]
}
return s
}
func Encode(s []string) []byte {
var b []byte
b = writeLen(b, len(s))
for _, ss := range s {
b = writeLen(b, len(ss))
b = append(b, ss...)
}
return b
}
func codecEqual(s []string) bool {
return fmt.Sprint(s) == fmt.Sprint(Decode(Encode(s)))
}
func main() {
var s []string
fmt.Println("equal", codecEqual(s))
s = []string{"", "a", "bc"}
e := Encode(s)
d := Decode(e)
fmt.Println("s", len(s), s)
fmt.Println("e", len(e), e)
fmt.Println("d", len(d), d)
fmt.Println("equal", codecEqual(s))
}
Output:
equal true
s 3 [ a bc]
e 19 [0 0 0 3 0 0 0 0 0 0 0 1 97 0 0 0 2 98 99]
d 3 [ a bc]
equal true
I would suggest to use PutUvarint and Uvarint for storing/retrieving len(s) and using []byte(str) to pass str to some io.Writer. With a string length known from Uvarint, one can buf := make([]byte, n) and pass the buf to some io.Reader.
Prepend the whole thing with length of the string array and repeat the above for all of its items. Reading the whole thing back is again reading first the outer length and repeating n-times the item read.
You can do something like this:
var lines = []string
var ctx = []byte{}
for _, s := range lines {
ctx = append(ctx, []byte(s)...)
}
It can be done easily using strings package. First you need to convert the slice of string to a string.
func Join(elems []string, sep string) string
You need to pass the slice of strings and the separator you need to separate the elements in the string. (examples: space or comma)
Then you can easily convert the string to a slice of bytes by type conversion.
package main
import (
"fmt"
"strings"
)
func main() {
//Slice of Strings
sliceStr := []string{"a","b","c","d"}
fmt.Println(sliceStr) //prints [a b c d]
//Converting slice of String to String
str := strings.Join(sliceStr,"")
fmt.Println(str) // prints abcd
//Converting String to slice of Bytes
sliceByte := []byte(str) //prints [97 98 99 100]
fmt.Println(sliceByte)
//Converting slice of bytes a String
str2 := string(sliceByte)
fmt.Println(str2) // prints abcd
//Converting string to a slice of Strings
sliceStr2 := strings.Split(str2,"")
fmt.Println(sliceStr2) //prints [a b c d]
}

Resources