Does taking the address of a slice element implies a copy of the element in Go? - go

Let's say a Go 1.18 program has a quite heavy struct, for which copying is to be considered costly:
type MyStruct struct {
P string
// a lot of properties
}
Now let's define a function, taking a slice of such elements as input parameter, which goal is to update properties of each slice element:
func myFunc(sl []MyStruct) {
for i := range sl {
p := &sl[i] // <-- HERE
p.P = "bar"
// other properties mutations
}
}
At the <-- HERE mark, is the Golang compiler making a temporary copy of the slice element into the loop's scope, or is it taking the address of the slice element in-place?
The idea is to avoid copying the whole slice element.
A working example: https://go.dev/play/p/jHOC2DauyrQ?v=goprev

&sl[i] does not copy the slice element, it just evaluates to the address of the ith element.
Slice elements act as variables, and &x evaluates to the address of the x variable. Think about it: since &sl[i] is the address of the ith element, the address does not need nor use the struct value, why would it make a copy?
If your slices are so big that you're worried about the performance impact of (implicit) copies, you really should consider storing pointers in the slice in the first place, and that way you can make your loop and accessing elements much simpler without having to worry about copies:
func myFunc(sl []*MyStruct) {
for _, v := range sl {
v.P = "bar"
// other properties mutations
}
}
Also note that if your slice holds non-pointers, and you want to change a field of a slice element, indexing the slice and referring to the field also does not involve copying the struct element:
func myFunc(sl []MyStruct) {
for i := range sl {
sl[i].P = "bar"
// other properties mutations
}
}
Yes, this may be more verbose and may be less efficient if you have to modify multiple fields (but compilers may also recognize and optimize the evaluation of multiple sl[i] expressions).

Related

How to modify field of a struct in a slice?

I have a JSON file named test.json which contains:
[
{
"name" : "john",
"interests" : ["hockey", "jockey"]
},
{
"name" : "lima",
"interests" : ["eating", "poker"]
}
]
Now I have written a golang script which reads the JSON file to an slice of structs, and then upon a condition check, modifies a struct fields by iterating over the slice.
Here is what I've tried so far:
package main
import (
"log"
"strings"
"io/ioutil"
"encoding/json"
)
type subDB struct {
Name string `json:"name"`
Interests []string `json:"interests"`
}
var dbUpdate []subDB
func getJSON() {
// open the file
filename := "test.json"
val, err := ioutil.ReadFile(filename)
if err != nil {
log.Fatal(err)
}
err = json.Unmarshal(val, &dbUpdate)
}
func (v *subDB) Change(newresponse []string) {
v.Interests = newresponse
}
func updater(name string, newinterest string) {
// iterating over the slice of structs
for _, item := range dbUpdate {
// checking if name supplied matches to the current struct
if strings.Contains(item.Name, name) {
flag := false // declare a flag variable
// item.Interests is a slice, so we iterate over it
for _, intr := range item.Interests {
// check if newinterest is within any one of slice value
if strings.Contains(intr, newinterest) {
flag = true
break // if we find one, we terminate the loop
}
}
// if flag is false, then we change the Interests field
// of the current struct
if !flag {
// Interests holds a slice of strings
item.Change([]string{newinterest}) // passing a slice of string
}
}
}
}
func main() {
getJSON()
updater("lima", "jogging")
log.Printf("%+v\n", dbUpdate)
}
The output I'm getting is:
[{Name:john Interests:[hockey jockey]} {Name:lima Interests:[eating poker]}]
However I should be getting an output like:
[{Name:john Interests:[hockey jockey]} {Name:lima Interests:[jogging]}]
My understanding was that since Change() has a pointer passed, it should directly modify the field. Can anyone point me out what I'm doing wrong?
The problem
Let's cite what the language specification says on the for ... range loops:
A "for" statement with a "range" clause iterates through all entries
of an array, slice, string or map, or values received on a channel.
For each entry it assigns iteration values to corresponding iteration
variables if present and then executes the block.
So, in
for _, item := range dbUpdate { ... }
the whole statement forms a scope in which a variable named item is declared and it gets assigned a value of each element of dbUpdate, in turn, form the first to the last — as the statement performs its iterations.
All assignments in Go, always and everywhere do copy the value of the expression being assigned, into a variable receiving that value.
So, when you have
type subDB struct {
Name string `json:"name"`
Interests []string `json:"interests"`
}
var dbUpdate []subDB
you have a slice whose backing array contains a set of elements, each of which has type subDB.
Consequently, when for ... range iterates over your slice, on each iteration a shallow copy of the fields of a subDB value contained in the current slice element is done: the values of those fields are copied into the variable item.
We could re-write what happes as this:
for i := 0; i < len(dbUpdate); i++ {
var item subDB
item = dbUpdate[i]
...
}
As you can see, if you mutate item in the loop's body, the changes you do to it do not in any way affect the collection's element currently being iterated over.
The solutions
Broadly speaking, the solution is to become fully acquainted with the fact that Go is very simple in most of the stuff it implements, and so range is no magic to: the iteration variable is just a variable, and assignment to it is just an assignment.
As to solving the particular problem, there are multiple ways.
Refer to a collection element by its index
Do
for i := range dbUpdate {
dbUpdate[i].FieldName = value
}
A corollary to this is that sometimes, when the element is complex or you'd like to delegate its mutation to some function, you may take a pointer to it:
for i := range dbUpdate {
p := &dbUpdate[i]
mutateSubDB(p)
}
...
func mutateSubDB(p *subDB) {
p.SomeField = someValue
}
Keep pointers in the slice
If your slice were declated like
var dbUpdates []*subDB
…and you'd keep pointers to (usually heap-allocated) SubDB values,
the
for _, ptr := range dbUpdate { ... }
statement would naturally copy a pointer to a SubDB (anonymous) variable into ptr as the slice contains pointers and so the assignment copies a pointer.
Since all pointers containing the same address are pointing to the same value, mutating the target variable through the pointer kept in the iteration variable would mutate the same thing which is pointed to by the slice's element.
Which approach to select should usually depend on considerations other than thinking about how one would iterate over the elements — simply because once you understand why your code did not work, you do not have this problem anymore.
As usually: if your values are really big, consider keeping pointers to them.
If you values need to be referenced from multiple places at the same time, keep pointers to them. In other cases keep the values directly — this greatly improves CPU data cache locality (simply put, by the time you're about to access the next element its contents will most likely have been already fetched from the memory, which does not occur when the CPU has to chase a pointer to access some arbitrary memory location through it).

Properties of slice of structs as function argument without allocating slice

I have a slice of structs that looks like:
type KeyValue struct {
Key uint32
Value uint32
}
var mySlice []Keyvalue
This slice mySlice can have a variable length.
I would like to pass all elements of this slice into a function with this signature:
takeKeyValues(keyValue []uint32)
Now I could easily allocate a slice of []uint32, populate it with all the keys/values of the elements in mySlice and then pass that slice as an argument into takeKeyValues()...
But I'm trying to find a way of doing that without any heap allocations, by directly passing all the keys/values on the stack. Is there some syntax trick which allows me to do that?
There is no safe way to arbitrarily reinterpret the memory layout of the your data. It's up to you whether the performance gain is worth the use of an unsafe type conversion.
Since the fields of KeyValue are of equal size and the struct has no padding, you can convert the underlying array of KeyValue elements to an array of uint32.
takeKeyValues((*[1 << 30]uint32)(unsafe.Pointer(&s[0]))[:len(s)*2])
https://play.golang.org/p/Jjkv9pdFITu
As an alternative to JimB's answer, you can also use the reflect (reflect.SliceHeader) and unsafe (unsafe.Pointer) packages to achieve this.
https://play.golang.org/p/RLrMgoWgI7t
s := []KeyValue{{0, 100}, {1, 101}, {2, 102}, {3, 103}}
var data []uint32
sh := (*reflect.SliceHeader)(unsafe.Pointer(&data))
sh.Data = uintptr(unsafe.Pointer(&s[0]))
sh.Len = len(s) * 2
sh.Cap = sh.Len
fmt.Println(data)
I a bit puzzled by what you want. If the slice is dynamic, you have to grow it dynamically as you add elements to it (at the call site).
Go does not have alloca(3), but I may propose another approach:
Come up with an upper limit of the elements you'd need.
Declare variable of the array type of that size (not a slice).
"Patch" as many elements you want (starting at index 0) in that array.
"Slice" the array and pass the slice to the callee.
Slicing an array does not copy memory — it just takes the address of the array's first element.
Something like this:
func caller(actualSize int) {
const MaxLen = 42
if actualSize > MaxLen {
panic("actualSize > MaxLen")
}
var kv [MaxLen]KeyValue
for i := 0; i < actualSize; i++ {
kv[i].Key, kv[i].Value = 100, 200
}
callee(kv[:actualSize])
}
func callee(kv []KeyValue) {
// ...
}
But note that when the Go compiler sees var kv [MaxLen]KeyValue, it does not have to allocate it on the stack. Basically, I beleive the language spec does not tell anything about the stack vs heap.

Why a slice []struct doesn't behave same as []builtin?

The slices are references to the underlying array. This makes sense and seems to work on builtin/primitive types but why is not working on structs? I assume that even if I update a struct field the reference/address is still the same.
package main
import "fmt"
type My struct {
Name string
}
func main() {
x := []int{1}
update2(x)
fmt.Println(x[0])
update(x)
fmt.Println(x[0])
my := My{Name: ""}
update3([]My{my})
// Why my[0].Name is not "many" ?
fmt.Println(my)
}
func update(x []int) {
x[0] = 999
return
}
func update2(x []int) {
x[0] = 1000
return
}
func update3(x []My) {
x[0].Name = "many"
return
}
To clarify: I'm aware that I could use pointers for both cases. I'm only intrigued why the struct is not updated (unlike the int).
What you do when calling update3 is you pass a new array, containing copies of the value, and you immediately discard the array. This is different from what you do with the primitive, as you keep the array.
There are two approaches here.
1) use an array of pointers instead of an array of values:
You could define update3 like this:
func update3(x []*My) {
x[0].Name = "many"
return
}
and call it using
update3([]*My{&my})
2) write in the array (in the same way you deal with the primitive)
arr := make([]My,1)
arr[0] = My{Name: ""}
update3(arr)
From the GO FAQ:
As in all languages in the C family, everything in Go is passed by
value. That is, a function always gets a copy of the thing being
passed, as if there were an assignment statement assigning the value
to the parameter. For instance, passing an int value to a function
makes a copy of the int, and passing a pointer value makes a copy of
the pointer, but not the data it points to. (See the next section for
a discussion of how this affects method receivers.)
Map and slice values behave like pointers: they are descriptors that
contain pointers to the underlying map or slice data. Copying a map or
slice value doesn't copy the data it points to.
Thus when you pass my you are passing a copy of your struct and the calling code won't see any changes made to that copy.
To have the function change the data in teh struct you have to pass a pointer to the struct.
Your third test is not the same as the first two. Look at this (Playground). In this case, you do not need to use pointers as you are not modifying the slice itself. You are modifying an element of the underlying array. If you wanted to modify the slice, by for instance, appending a new element, you would need to use a pointer to pass the slice by reference. Notice that I changed the prints to display the type as well as the value.

Why is the method receiver not required to be a pointer when implementing sort.Interface for golang types?

I am reading the docs for the sort stdlib package and the sample code reads like this:
type ByAge []Person
func (a ByAge) Len() int { return len(a) }
func (a ByAge) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
func (a ByAge) Less(i, j int) bool { return a[i].Age < a[j].Age }
As I've learnt, function that mutate a type T needs to use *T as its method receiver.
In the case of Len, Swap and Less why does it work ? Or am I misunderstanding the difference between using T vs *T as method receivers ?
Go has three reference types:
map
slice
channel
Every instance of these types holds a pointer to the actual data internally. This means that
when you pass a value of one of these types the value is copied like every other value but the
internal pointer still points to the same value.
Quick example (run on play):
func dumpFirst(s []int) {
fmt.Printf("address of slice var: %p, address of element: %p\n", &s, &s[0])
}
s1 := []int{1, 2, 3}
s2 := s1
dumpFirst(s1)
dumpFirst(s2)
will print something like:
address of slice var: 0x1052e110, address of element: 0x1052e100
address of slice var: 0x1052e120, address of element: 0x1052e100
You can see: the address of the slice variable changes but the address of the first element in that slice remains the same.
I just had a minor epiphany regarding this exact same question.
As has already been explained, a typed slice (not a pointer to a slice) can implement the sort.Interface interface; part of the reason for this is that, even though the slice is being copied, one of its fields is a pointer to an array, so any modification of that backing array will be reflected in the original slice.
Normally, though, this isn't enough of a justification for a bare slice to be an acceptable receiver. It's generally incorrect to try to modify a struct as a receiver of a method, because any append() calls will change the slice copy's length without modifying the original slice's headers. Slice modification may even trigger the initialization of a new backing array, completely disconnecting the copied receiver from the original slice.
By the very nature of sorting, however, this isn't a problem in the sort.Sort case. The only array-modifying operation it exercises is Swap, meaning the array's required memory will remain the same, so the slice will not change size, and therefore there will be no changes to the slice's actual values (starting index, length, and array pointer).
I'm sure this was obvious to a lot of people, but it just dawned on me, and I thought it might be useful to others wondering why sort plays nicely with bare slices.

Call struct method in range loop

example code (edited code snippet): http://play.golang.org/p/eZV4WL-4N_
Why is it that
for x, _ := range body.Personality {
body.Personality[x].Mutate()
}
successfully mutates the structs' contents, but
for _, pf := range body.Personality{
pf.Mutate()
}
does not?
Is it that range creates new instances of each item it iterates over? because the struct does in fact mutate but it doesn't persist.
The range keyword copies the results of the array, thus making it impossible to alter the
contents using the value of range. You have to use the index or a slice of pointers instead of values
if you want to alter the original array/slice.
This behaviour is covered by the spec as stated here. The crux is that an assignment line
x := a[i] copies the value a[i] to x as it is no pointer. Since range is defined to use
a[i], the values are copied.
Your second loop is roughly equivalent to:
for x := range body.Personality {
pf := body.Personality[x]
pf.Mutate()
}
Since body.Personality is an array of structs, the assignment to pf makes a copy of the struct, and that is what we call Mutate() on.
If you want to range over your array in the way you do in the example, one option would be to make it an array of pointers to structs (i.e. []*PFile). That way the assignment in the loop will just take a pointer to the struct allowing you to modify it.

Resources