Why is the method receiver not required to be a pointer when implementing sort.Interface for golang types? - go

I am reading the docs for the sort stdlib package and the sample code reads like this:
type ByAge []Person
func (a ByAge) Len() int { return len(a) }
func (a ByAge) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
func (a ByAge) Less(i, j int) bool { return a[i].Age < a[j].Age }
As I've learnt, function that mutate a type T needs to use *T as its method receiver.
In the case of Len, Swap and Less why does it work ? Or am I misunderstanding the difference between using T vs *T as method receivers ?

Go has three reference types:
map
slice
channel
Every instance of these types holds a pointer to the actual data internally. This means that
when you pass a value of one of these types the value is copied like every other value but the
internal pointer still points to the same value.
Quick example (run on play):
func dumpFirst(s []int) {
fmt.Printf("address of slice var: %p, address of element: %p\n", &s, &s[0])
}
s1 := []int{1, 2, 3}
s2 := s1
dumpFirst(s1)
dumpFirst(s2)
will print something like:
address of slice var: 0x1052e110, address of element: 0x1052e100
address of slice var: 0x1052e120, address of element: 0x1052e100
You can see: the address of the slice variable changes but the address of the first element in that slice remains the same.

I just had a minor epiphany regarding this exact same question.
As has already been explained, a typed slice (not a pointer to a slice) can implement the sort.Interface interface; part of the reason for this is that, even though the slice is being copied, one of its fields is a pointer to an array, so any modification of that backing array will be reflected in the original slice.
Normally, though, this isn't enough of a justification for a bare slice to be an acceptable receiver. It's generally incorrect to try to modify a struct as a receiver of a method, because any append() calls will change the slice copy's length without modifying the original slice's headers. Slice modification may even trigger the initialization of a new backing array, completely disconnecting the copied receiver from the original slice.
By the very nature of sorting, however, this isn't a problem in the sort.Sort case. The only array-modifying operation it exercises is Swap, meaning the array's required memory will remain the same, so the slice will not change size, and therefore there will be no changes to the slice's actual values (starting index, length, and array pointer).
I'm sure this was obvious to a lot of people, but it just dawned on me, and I thought it might be useful to others wondering why sort plays nicely with bare slices.

Related

Does taking the address of a slice element implies a copy of the element in Go?

Let's say a Go 1.18 program has a quite heavy struct, for which copying is to be considered costly:
type MyStruct struct {
P string
// a lot of properties
}
Now let's define a function, taking a slice of such elements as input parameter, which goal is to update properties of each slice element:
func myFunc(sl []MyStruct) {
for i := range sl {
p := &sl[i] // <-- HERE
p.P = "bar"
// other properties mutations
}
}
At the <-- HERE mark, is the Golang compiler making a temporary copy of the slice element into the loop's scope, or is it taking the address of the slice element in-place?
The idea is to avoid copying the whole slice element.
A working example: https://go.dev/play/p/jHOC2DauyrQ?v=goprev
&sl[i] does not copy the slice element, it just evaluates to the address of the ith element.
Slice elements act as variables, and &x evaluates to the address of the x variable. Think about it: since &sl[i] is the address of the ith element, the address does not need nor use the struct value, why would it make a copy?
If your slices are so big that you're worried about the performance impact of (implicit) copies, you really should consider storing pointers in the slice in the first place, and that way you can make your loop and accessing elements much simpler without having to worry about copies:
func myFunc(sl []*MyStruct) {
for _, v := range sl {
v.P = "bar"
// other properties mutations
}
}
Also note that if your slice holds non-pointers, and you want to change a field of a slice element, indexing the slice and referring to the field also does not involve copying the struct element:
func myFunc(sl []MyStruct) {
for i := range sl {
sl[i].P = "bar"
// other properties mutations
}
}
Yes, this may be more verbose and may be less efficient if you have to modify multiple fields (but compilers may also recognize and optimize the evaluation of multiple sl[i] expressions).

Any one can make sense of connStateInterface?

func (c *conn) setState(nc net.Conn, state ConnState) {
...
c.curState.Store(connStateInterface[state])
...
}
// connStateInterface is an array of the interface{} versions of
// ConnState values, so we can use them in atomic.Values later without
// paying the cost of shoving their integers in an interface{}.
var connStateInterface = [...]interface{}{
StateNew: StateNew,
StateActive: StateActive,
StateIdle: StateIdle,
StateHijacked: StateHijacked,
StateClosed: StateClosed,
}
I can't figure out the trick with connStateInterface, how exactly does it work?
There's a few things going on here...
The [...] declaration creates an actual array instead of a slice, so that indirection is removed. What's being declared here is an array of interface{} types... so you might wonder why the weird map-looking notation?
The StateXXX variables are simply constants declared further above, so they are ints... so the declaration is actually of the form index: value.
Here's a less obfuscated example of that using an array of ints:
var i = [...]int{4: 2, 2: 7}
This will allocate an array containing:
[0, 0, 7, 0, 2]
... note that index 2 has 7, index 4 has 2. Not a common way of declaring an array, but it's valid Go.
So going back to the original declaration, just take the example I gave above, and instead of int, make the array of type interface{}:
var i = [...]interface{}{4: 2, 2: 7}
And you'll get a similar array, but with nil interface values in place of zeroes.
Getting even closer to the original code, the StateXXX constants are just ints, only not literals like in my example.
So, what's the point of all this? Why all the obfuscation?
It's a performance hack. The function c.curState.Store() takes an argument of type interface{}. If you were to pass it an int, the compiled code would have to fumble about with converting the type on each call. A more clear (though obviously impractical) illustration of this might be:
var val interface{}
for i := 0; i < 1000000; i++ {
// the types are different, compiler has to fumble int vs. interface{}
val = i
// do something with val
}
Every time you do val = i a conversion between int and interface{} needs to happen. The code you posted avoids this by creating a static lookup table where all the values are already of type interface.
Therefore, this:
c.curState.Store(connStateInterface[state])
is more efficient than this:
c.curState.Store(state)
Since state would, in this case, need to undergo the int -> interface{} conversion. In the optimized code, state is merely an index looking up a value into an array, the result of which gets you an interface{}... so the int -> interface{} type conversion is avoided.
I'm not familiar with that code, but I'd imagine it's in a critical path and the nanoseconds or whatever savings shaved off likely makes a difference.

Go heap.Interface as a struct

I'm creating a priority queue using Go's heap package. There is an example of one in the documentation.
The queue I'm creating needs to be based around a struct rather than a slice because it requires other properties like a mutex.
type PQueue struct {
queue []*Item
sync.Mutex
}
I implement all the methods that heap.Interface requires.
The issue is that my PQueue.Push method seems not to be permanently adding a value to PQueue.queue.
func (p PQueue) Push(x interface{}) {
p.Lock()
defer p.Unlock()
item := x.(*Item)
item.place = len(p.queue) // the index of an item in the queue
p.queue = append(p.queue, item)
// len(p.queue) does increase
// after the functions exits, the queues length has not increased
}
If I print the length of p.queue at the end of this function, the length has increased. After the functions exits however, it seems the original struct does not get updated.
I think it might be happening because of func (p PQueue) not being a pointer. Why might that be? Is there a way to fix it? If I were to use func (p *PQeueue) Push(x interface{}) instead, I would need to implement my own heap because heap.Interface specifically requires no pointer. Is that my only option?
The problem is that you are appending to a copy of your slice. Thus the change shows within the function, but is lost once you return from the function.
In this blog article from the section Passing slices to functions:
It's important to understand that even though a slice contains a
pointer, it is itself a value. Under the covers, it is a struct value
holding a pointer and a length. It is not a pointer to a struct.
With append you are modifying the slice header. And
Thus if we want to write a function that modifies the header, we must
return it as a result parameter
Or:
Another way to have a function modify the slice header is to pass a
pointer to it.
As a result you need to pass a pointer if you want to modify it with append. Simply change the method to use a pointer receiver. And for that to work you need to call init with a pointer like heap.Init(&pq) as shown in the example that you linked to which does just that and also uses pointer receivers.
From the spec on Method Sets:
The method set of the corresponding pointer type *T is the set of all methods
declared with receiver *T or T (that is, it also contains the method
set of T).
So using a pointer type will work with value and pointer receivers and still implement the interface.
You are right about the problem being related to the receiver of your Push method: the method will receive a copy of the PQueue, so any changes made to the struct will not persist.
Changing the method to use a pointer as a receiver is the correct change, but this also means that PQueue no longer implements heap.Interface. This is due to the fact that Go does not let you take a pointer to the value stored inside an interface variable, so the automatic translation of q.Push() to (&q).Push() does not occur.
This isn't a dead end though, since *PQueue should still implement the heap.Interface. So if you were previously calling heap.Init(q), just change it to heap.Init(&q).
I think it might be happening because of func (p PQueue) not being a pointer
That's right. Quoting Effective Go:
invoking [the method] on a value would cause the method to receive a
copy of the value, so any modifications would be discarded.
You say:
heap.Interface specifically requires no pointer
I'm confused, the example you point to is, in fact, using a pointer:
func (pq *PriorityQueue) Push(x interface{}) {
n := len(*pq)
item := x.(*Item)
item.index = n
*pq = append(*pq, item)
}
Maybe something else is going on?

Why a slice []struct doesn't behave same as []builtin?

The slices are references to the underlying array. This makes sense and seems to work on builtin/primitive types but why is not working on structs? I assume that even if I update a struct field the reference/address is still the same.
package main
import "fmt"
type My struct {
Name string
}
func main() {
x := []int{1}
update2(x)
fmt.Println(x[0])
update(x)
fmt.Println(x[0])
my := My{Name: ""}
update3([]My{my})
// Why my[0].Name is not "many" ?
fmt.Println(my)
}
func update(x []int) {
x[0] = 999
return
}
func update2(x []int) {
x[0] = 1000
return
}
func update3(x []My) {
x[0].Name = "many"
return
}
To clarify: I'm aware that I could use pointers for both cases. I'm only intrigued why the struct is not updated (unlike the int).
What you do when calling update3 is you pass a new array, containing copies of the value, and you immediately discard the array. This is different from what you do with the primitive, as you keep the array.
There are two approaches here.
1) use an array of pointers instead of an array of values:
You could define update3 like this:
func update3(x []*My) {
x[0].Name = "many"
return
}
and call it using
update3([]*My{&my})
2) write in the array (in the same way you deal with the primitive)
arr := make([]My,1)
arr[0] = My{Name: ""}
update3(arr)
From the GO FAQ:
As in all languages in the C family, everything in Go is passed by
value. That is, a function always gets a copy of the thing being
passed, as if there were an assignment statement assigning the value
to the parameter. For instance, passing an int value to a function
makes a copy of the int, and passing a pointer value makes a copy of
the pointer, but not the data it points to. (See the next section for
a discussion of how this affects method receivers.)
Map and slice values behave like pointers: they are descriptors that
contain pointers to the underlying map or slice data. Copying a map or
slice value doesn't copy the data it points to.
Thus when you pass my you are passing a copy of your struct and the calling code won't see any changes made to that copy.
To have the function change the data in teh struct you have to pass a pointer to the struct.
Your third test is not the same as the first two. Look at this (Playground). In this case, you do not need to use pointers as you are not modifying the slice itself. You are modifying an element of the underlying array. If you wanted to modify the slice, by for instance, appending a new element, you would need to use a pointer to pass the slice by reference. Notice that I changed the prints to display the type as well as the value.

Go's value method receiver vs pointer method receiver

I've read a Tour of Go and Effective Go, http://golang.org/doc/effective_go.html#pointers_vs_values, but still have a difficult time understanding when you would define a method on a struct using a value method receiver instead of a pointer method receiver. In other words, when would this:
type ByteSlice []byte
func (slice ByteSlice) Append(data []byte) []byte {
}
be preferable over this?
func (p *ByteSlice) Append(data []byte) {
slice := *p
*p = slice
}
Slices are one place where it's not always obvious at first. The Slice header is small, so copying it is cheap, and the underlying array is referenced via a pointer, so you can manipulate the contents of a slice with a value receiver. You can see this in the sort package, where the methods for the sortable types are defined without pointers.
The only time you need to use a pointer with a slice, is if you're going to manipulate the slice header, which means changing the length or capacity. For an Append method, you would want:
func (p *ByteSlice) Append(data []byte) {
*p = append(*p, data...)
}
There is an FAQ entry on that matter:
Should I define methods on values or pointers?
First, and most important, does the method need to modify the receiver? If it does, the receiver must be a pointer.
...
Second is the consideration of efficiency. If the receiver is large, a big struct for instance, it will be much cheaper to use a pointer receiver.

Resources