Which optimisations does the Go 1.6 compiler apply when converting between []byte and string or vice versa? - go

I know that converting from a []byte to a string, or vice versa, results in a copy of the underlying array being made. This makes sense to me, from the point of view of strings being immutable.
Then I read here that two optimisations get made by the compiler in specific cases:
"The first optimization avoids extra allocations when []byte keys are used to lookup entries in map[string] collections: m[string(key)]."
This makes sense because the conversion is only scoped to the square brackets, so no risk of mutating the string there.
"The second optimization avoids extra allocations in for range clauses where strings are converted to []byte: for i,v := range []byte(str) {...}."
This makes sense because once again - no way of mutating the string here.
Also mentioned is further optimisations on the todo list (not sure which todo list is being referred to), so my question is:
Does any other such (further) optimisations exist in Go 1.6 and if so, what are they?

[]byte to string
For []byte to string conversion, the compiler generates a call to the internal runtime.slicebytetostringtmp function (link to source) when it can prove
that the string form will be discarded before the calling goroutine
could possibly modify the original slice or synchronize with another
goroutine.
runtime.slicebytetostringtmp returns a string referring to the actual []byte bytes, so it does not allocate. The comment in the function says
// First such case is a m[string(k)] lookup where
// m is a string-keyed map and k is a []byte.
// Second such case is "<"+string(b)+">" concatenation where b is []byte.
// Third such case is string(b)=="foo" comparison where b is []byte.
In short, for a b []byte:
map lookup m[string(b)] does not allocate
"<"+string(b)+"> concatenation does not allocate
string(b)=="foo" comparison does not allocate
The second optimization was implemented on 22 Jan 2015, and it is in go1.6
The third optimization was implemented on 27 Jan 2015, and it is in go1.6
So, for example, in the following:
var bs []byte = []byte{104, 97, 108, 108, 111}
func main() {
x := string(bs) == "hello"
println(x)
}
the comparison does not cause allocations in go1.6.
String to []byte
Similarly, the runtime.stringtoslicebytetmp function (link to source) says:
// Return a slice referring to the actual string bytes.
// This is only for use by internal compiler optimizations
// that know that the slice won't be mutated.
// The only such case today is:
// for i, c := range []byte(str)
so i, c := range []byte(str) does not allocate, but you already knew that.

Related

Go vet reports "possible misuse of reflect.SliceHeader"

I have the following code snippet which "go vet" complains about with the warning "possible misuse of reflect.SliceHeader". I can not find very much information about this warning other then this. After reading that it is not very clear to me what is needed to do this in a way that makes go vet happy - and without possible gc issues.
The goal of the snippet is to have a go function copy data to memory which is managed by an opaque C library. The Go function expects a []byte as a parameter.
func Callback(ptr unsafe.Pointer, buffer unsafe.Pointer, size C.longlong) C.longlong {
...
sh := &reflect.SliceHeader{
Data: uintptr(buffer),
Len: int(size),
Cap: int(size),
}
buf := *(*[]byte)(unsafe.Pointer(sh))
err := CopyToSlice(buf)
if err != nil {
log.Fatal("failed to copy to slice")
}
...
}
https://pkg.go.dev/unsafe#go1.19.4#Pointer
Pointer represents a pointer to an arbitrary type. There are four
special operations available for type Pointer that are not available
for other types:
A pointer value of any type can be converted to a Pointer.
A Pointer can be converted to a pointer value of any type.
A uintptr can be converted to a Pointer.
A Pointer can be converted to a uintptr.
Pointer therefore allows a program to defeat the type system and read
and write arbitrary memory. It should be used with extreme care.
The following patterns involving Pointer are valid. Code not using
these patterns is likely to be invalid today or to become invalid in
the future. Even the valid patterns below come with important caveats.
Running "go vet" can help find uses of Pointer that do not conform to
these patterns, but silence from "go vet" is not a guarantee that the
code is valid.
(6) Conversion of a reflect.SliceHeader or reflect.StringHeader Data
field to or from Pointer.
As in the previous case, the reflect data structures SliceHeader and
StringHeader declare the field Data as a uintptr to keep callers from
changing the result to an arbitrary type without first importing
"unsafe". However, this means that SliceHeader and StringHeader are
only valid when interpreting the content of an actual slice or string
value.
var s string
hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
hdr.Data = uintptr(unsafe.Pointer(p)) // case 6 (this case)
hdr.Len = n
In this usage hdr.Data is really an alternate way to refer to the
underlying pointer in the string header, not a uintptr variable
itself.
In general, reflect.SliceHeader and reflect.StringHeader should be used only as *reflect.SliceHeader and *reflect.StringHeader pointing at actual slices or strings, never as plain structs. A program should not declare or allocate variables of these struct types.
// INVALID: a directly-declared header will not hold Data as a reference.
var hdr reflect.StringHeader
hdr.Data = uintptr(unsafe.Pointer(p))
hdr.Len = n
s := *(*string)(unsafe.Pointer(&hdr)) // p possibly already lost
It looks like JimB (from the comments) hinted upon the most correct answer, though he didn't post it as an answer and he didn't include an example. The following passes go vet, staticcheck, and golangci-lint - and doesn't segfault so I think it is the correct answer.
func Callback(ptr unsafe.Pointer, buffer unsafe.Pointer, size C.longlong) C.longlong {
...
buf := unsafe.Slice((*byte)(buffer), size)
err := CopyToSlice(buf)
if err != nil {
log.Fatal("failed to copy to slice")
}
...
}

Difference in Go between allocating memory by new(Type) and &Type{}

Consider the following example:
type House struct{}
func main() {
house1 := new(House)
house2 := &House{}
fmt.Printf("%T | %T\n", house1, house2)
}
Output: *main.House | *main.House
Both assignments produce a pointer to type House (from package main).
From the go docu of new:
// The new built-in function allocates memory. The first argument is a type,
// not a value, and the value returned is a pointer to a newly
// allocated zero value of that type.
Is memory allocation in both assignments technically the same? Is there a best practice?
Is memory allocation in both assignments technically the same?
I'm not going to talk about implementation details, since those may or may not change.
From the point of view of the language specifications, there is a difference.
Allocating with new(T) follows the rules of allocation (my own note in [italics]):
The built-in function new takes a type T, allocates storage for a variable of that type at run time, and returns a value of type *T pointing to it. The variable is initialized as described in the section on initial values. [its zero value]
Allocating with a composite literal as &T{} follows the rules of composite literals and addressing:
Composite literals construct values for structs, arrays, slices, and maps and create a new value each time they are evaluated.
Taking the address of a composite literal generates a pointer to a unique variable initialized with the literal's value.
So both new(T) and &T{} allocate storage and yield values of type *T. However:
new(T) accepts any type, including predeclared identifiers as int and bool, which may come in handy if you just need to initialize such vars with a one-liner:
n := new(int) // n is type *int and points to 0
b := new(bool) // b is type *bool and points to false
new (quoting Effective Go) "does not initialize1 that memory, it only zeros it". I.e. the memory location will have the type's zero value.
composite literals can be used only for structs, arrays, slices and maps
composite literals can construct non-zero values by explicitly initializing struct fields or array/slice/map items
Bottom line: as Effective Go points out (same paragraph as above):
if a composite literal contains no fields at all, it creates a zero value for the type. The expressions new(T) and &T{} are equivalent.
[1]: there's an inconsistency in the use of the term "initialization" in the specs and Effective Go. The take-away is that with new you can only yield a zero-value, whereas a composite literal allows you to construct non-zero values. And obviously the specs are the source of truth.

Any one can make sense of connStateInterface?

func (c *conn) setState(nc net.Conn, state ConnState) {
...
c.curState.Store(connStateInterface[state])
...
}
// connStateInterface is an array of the interface{} versions of
// ConnState values, so we can use them in atomic.Values later without
// paying the cost of shoving their integers in an interface{}.
var connStateInterface = [...]interface{}{
StateNew: StateNew,
StateActive: StateActive,
StateIdle: StateIdle,
StateHijacked: StateHijacked,
StateClosed: StateClosed,
}
I can't figure out the trick with connStateInterface, how exactly does it work?
There's a few things going on here...
The [...] declaration creates an actual array instead of a slice, so that indirection is removed. What's being declared here is an array of interface{} types... so you might wonder why the weird map-looking notation?
The StateXXX variables are simply constants declared further above, so they are ints... so the declaration is actually of the form index: value.
Here's a less obfuscated example of that using an array of ints:
var i = [...]int{4: 2, 2: 7}
This will allocate an array containing:
[0, 0, 7, 0, 2]
... note that index 2 has 7, index 4 has 2. Not a common way of declaring an array, but it's valid Go.
So going back to the original declaration, just take the example I gave above, and instead of int, make the array of type interface{}:
var i = [...]interface{}{4: 2, 2: 7}
And you'll get a similar array, but with nil interface values in place of zeroes.
Getting even closer to the original code, the StateXXX constants are just ints, only not literals like in my example.
So, what's the point of all this? Why all the obfuscation?
It's a performance hack. The function c.curState.Store() takes an argument of type interface{}. If you were to pass it an int, the compiled code would have to fumble about with converting the type on each call. A more clear (though obviously impractical) illustration of this might be:
var val interface{}
for i := 0; i < 1000000; i++ {
// the types are different, compiler has to fumble int vs. interface{}
val = i
// do something with val
}
Every time you do val = i a conversion between int and interface{} needs to happen. The code you posted avoids this by creating a static lookup table where all the values are already of type interface.
Therefore, this:
c.curState.Store(connStateInterface[state])
is more efficient than this:
c.curState.Store(state)
Since state would, in this case, need to undergo the int -> interface{} conversion. In the optimized code, state is merely an index looking up a value into an array, the result of which gets you an interface{}... so the int -> interface{} type conversion is avoided.
I'm not familiar with that code, but I'd imagine it's in a critical path and the nanoseconds or whatever savings shaved off likely makes a difference.

Map types are reference types. var m map[string]int doesn't point to an initialized map. What doe this mean?

I have read on the golang blog: https://blog.golang.org/go-maps-in-action that:
var m map[string]int
Map types are reference types, like pointers or slices, and so the
value of m above is nil; it doesn't point to an initialized map. A nil
map behaves like an empty map when reading, but attempts to write to a
nil map will cause a runtime panic; don't do that. To initialize a
map, use the built in make function:
m = make(map[string]int)
The make function allocates and initializes a hash map data structure
and returns a map value that points to it.
I have a hard time understanding some parts of this:
What does var m map[string]int do?
Why do I need to write m = make(map[string]int) but not i = make(int)
What does var m map[string]int do?
It tells the compiler that m is a variable of type map[string]int, and assigns "The Zero Value" of the type map[string] int to m (that's why m is nil as nil is The Zero Value of any map).
Why do I need to write m = make(map[string]int) but not i = make(int)
You don't need to. You can create a initialized map also like this:
m = map[string]int{}
which does exactly the same.
The difference between maps and ints is: A nil map is perfectly fine. E.g. len() of a nil map works and is 0. The only thing you cannot do with a nil map is store key-value-pairs. If you want to do this you'll have to prepare/initialize the map. This preparation/initialization in Go is done through the builtin make (or by a literal map as shown above). This initialization process is not needed for ints. As there are no nil ints this initialization would be total noise.
Note that you do not initialize the variable m: The variable m is a map of strings to ints, initialized or not. Like i is a variable for ints. Now ints are directly usable while maps require one more step because the language works that way.
What does var m map[string]int do?
You can think about it like pointer with nil value, it does not point to anything yet but able to point to concrete value.
Why do I need to write m = make(map[string]int) but not i = make(int)
https://golang.org/doc/effective_go.html#allocation_make
Back to allocation. The built-in function make(T, args) serves a purpose different from new(T). It creates slices, maps, and channels only, and it returns an initialized (not zeroed) value of type T (not *T). The reason for the distinction is that these three types represent, under the covers, references to data structures that must be initialized before use. A slice, for example, is a three-item descriptor containing a pointer to the data (inside an array), the length, and the capacity, and until those items are initialized, the slice is nil. For slices, maps, and channels, make initializes the internal data structure and prepares the value for use. For instance,
make([]int, 10, 100)
allocates an array of 100 ints and then creates a slice structure with length 10 and a capacity of 100 pointing at the first 10 elements of the array. (When making a slice, the capacity can be omitted; see the section on slices for more information.) In contrast, new([]int) returns a pointer to a newly allocated, zeroed slice structure, that is, a pointer to a nil slice value.
These examples illustrate the difference between new and make.
var p *[]int = new([]int) // allocates slice structure; *p == nil; rarely useful
var v []int = make([]int, 100) // the slice v now refers to a new array of 100 ints
// Unnecessarily complex:
var p *[]int = new([]int)
*p = make([]int, 100, 100)
// Idiomatic:
v := make([]int, 100)
Remember that make applies only to maps, slices and channels and does not return a pointer. To obtain an explicit pointer allocate with new or take the address of a variable explicitly.
All words have the same length of 32 bits (4 bytes) or 64 bits (8 bytes),
depending on the processor and the operating system. They are identified by their memory address (represented as a hexadecimal number).
All variables of primitive types like int, float, bool, string ... are value types, they point directly to the values contained in the memory. Also composite types like arrays and structs are value types. When assigning with = the value of a value type to another variable: j = i, a copy of the original value i is made in memory.
More complex data which usually needs several words are treated as reference types. A reference type variable r1 contains the address (a number) of the memory location where the value of r1 is stored (or at least the 1st word of it):
For reference types when assigning r2 = r1, only the reference (the address) is copied and not the value!!. If the value of r1 is modified, all references of that value (like r1 and r2) will be reflected.
In Go pointers are reference types, as well as slices, maps and channels. The variables that are referenced are stored in the heap, which is garbage collected.
In the light of the above statements it's clear why the article states:
To initialize a map, use the built in make function.
The make function allocates and initializes a hash map data structure and returns a map value that points to it. This means you can write into it, compare to
var m map[string]int
which is readable, resulting a nil map, but an attempt to write to a nil map will cause a runtime panic. This is the reason why it's important to initialize the map with make.
m = make(map[string]int)

Conversion of a slice of string into a slice of custom type

I'm quite new to Go, so this might be obvious. The compiler does not allow the following code:
(http://play.golang.org/p/3sTLguUG3l)
package main
import "fmt"
type Card string
type Hand []Card
func NewHand(cards []Card) Hand {
hand := Hand(cards)
return hand
}
func main() {
value := []string{"a", "b", "c"}
firstHand := NewHand(value)
fmt.Println(firstHand)
}
The error is:
/tmp/sandbox089372356/main.go:15: cannot use value (type []string) as type []Card in argument to NewHand
From the specs, it looks like []string is not the same underlying type as []Card, so the type conversion cannot occur.
Is it, indeed, the case, or did I miss something?
If it is the case, why is it so? Assuming, in a non-pet-example program, I have as input a slice of string, is there any way to "cast" it into a slice of Card, or do I have to create a new structure and copy the data into it? (Which I'd like to avoid since the functions I'll need to call will modify the slice content).
There is no technical reason why conversion between slices whose elements have identical underlying types (such as []string and []Card) is forbidden. It was a specification decision to help avoid accidental conversions between unrelated types that by chance have the same structure.
The safe solution is to copy the slice. However, it is possible to convert directly (without copying) using the unsafe package:
value := []string{"a", "b", "c"}
// convert &value (type *[]string) to *[]Card via unsafe.Pointer, then deref
cards := *(*[]Card)(unsafe.Pointer(&value))
firstHand := NewHand(cards)
https://play.golang.org/p/tto57DERjYa
Obligatory warning from the package documentation:
unsafe.Pointer allows a program to defeat the type system and read and write arbitrary memory. It should be used with extreme care.
There was a discussion on the mailing list about conversions and underlying types in 2011, and a proposal to allow conversion between recursively equivalent types in 2016 which was declined "until there is a more compelling reason".
The underlying type of Card might be the same as the underlying type of string (which is itself: string), but the underlying type of []Card is not the same as the underlying type of []string (and therefore the same applies to Hand).
You cannot convert a slice of T1 to a slice of T2, it's not a matter of what underlying types they have, if T1 is not identical to T2, you just can't. Why? Because slices of different element types may have different memory layout (different size in memory). For example the elements of type []byte occupy 1 byte each. The elements of []int32 occupy 4 bytes each. Obviously you can't just convert one to the other even if all values are in the range 0..255.
But back to the roots: if you need a slice of Cards, why do you create a slice of strings in the first place? You created the type Card because it is not a string (or at least not just a string). If so and you require []Card, then create []Card in the first place and all your problems go away:
value := []Card{"a", "b", "c"}
firstHand := NewHand(value)
fmt.Println(firstHand)
Note that you are still able to initialize the slice of Card with untyped constant string literals because it can be used to initialize any type whose underlying type is string. If you want to involve typed string constants or non-constant expressions of type string, you need explicit conversion, like in the example below:
s := "ddd"
value := []Card{"a", "b", "c", Card(s)}
If you have a []string, you need to manually build a []Card from it. There is no "easier" way. You can create a helper toCards() function so you can use it everywhere you need it.
func toCards(s []string) []Card {
c := make([]Card, len(s))
for i, v := range s {
c[i] = Card(v)
}
return c
}
Some links for background and reasoning:
Go Language Specification: Conversions
why []string can not be converted to []interface{} in golang
Cannot convert []string to []interface {}
What about memory layout means that []T cannot be converted to []interface in Go?
From the specs, it looks like []string is not the same underlying type as []Card, so the type conversion cannot occur.
Exactly right. You have to convert it by looping and copying over each element, converting the type from string to Card on the way.
If it is the case, why is it so? Assuming, in a non-pet-example program, I have as input a slice of string, is there any way to "cast" it into a slice of Card, or do I have to create a new structure and copy the data into it? (Which I'd like to avoid since the functions I'll need to call will modify the slice content).
Because conversions are always explicit and the designers felt that when a conversion implicitly involves a copy it should be made explicit as well.

Resources