Using Pointers in a for loop - for-loop

I'm struggling to understand why I have a bug in my code in one state but not the other. It's been a while since I've covered pointers, so I'm probably rusty!
Basically I have a repository structure I'm using to store an object in memory, that has a Store function.
type chartsRepository struct {
mtx sync.RWMutex
charts map[ChartName]*Chart
}
func (r *chartsRepository) Store(c *Chart) error {
r.mtx.Lock()
defer r.mtx.Unlock()
r.charts[c.Name] = c
return nil
}
So all it does is put a RW mutex lock on and adds the pointer to a map, referenced by an identifier.
Then I've got a function that will basically loop through a slice of these objects, storing them all in the repository.
type service struct {
charts Repository
}
func (svc *service) StoreCharts(arr []Chart) error {
hasError := false
for _, chart := range arr {
err := svc.repo.Store(&chart)
// ... error handling
}
if hasError {
// ... Deals with the error object
return me
}
return nil
}
The above doesn't work, it looks like everything works fine at first, but on trying to access the data later, the entries in the map all point to the same Chart object, despite having different keys.
If I do the following and move the pointer reference to another function, everything works as expected:
func (svc *service) StoreCharts(arr []Chart) error {
// ...
for _, chart := range arr {
err := svc.storeChart(chart)
}
// ...
}
func (svc *service) storeChart(c Chart) error {
return svc.charts.Store(&c)
}
I'm assuming the issue is that because the loop overwrites the reference to the chart in the for loop, the pointer reference also changes. When the pointer is generated in an independent function, that reference is never overwritten. Is that right?
I feel like I'm being stupid, but shouldn't the pointer be generated by &chart and that's independent of the chart reference? I also tried creating a new variable for the pointer p := &chart in the for loop and that didn't work either.
Should I just avoid generating pointers in loops?

This is because there is only a single loop variable chart, and in each iteration just a new value is assigned to it. So if you attempt to take the address of the loop variable, it will be the same in each iteration, so you will store the same pointer, and the pointed object (the loop variable) is overwritten in each iteration (and after the loop it will hold the value assigned in the last iteration).
This is mentioned in Spec: For statements: For statements with range clause:
The iteration variables may be declared by the "range" clause using a form of short variable declaration (:=). In this case their types are set to the types of the respective iteration values and their scope is the block of the "for" statement; they are re-used in each iteration. If the iteration variables are declared outside the "for" statement, after execution their values will be those of the last iteration.
Your second version works, because you pass the loop variable to a function, so a copy will be made of it, and then you store the address of the copy (which is detached from the loop variable).
You can achieve the same effect without a function though: just create a local copy and use the address of that:
for _, chart := range arr {
chart2 := chart
err := svc.repo.Store(&chart2) // Address of the local var
// ... error handling
}
Also note that you may also store the address of the slice elements:
for i := range arr {
err := svc.repo.Store(&arr[i]) // Address of the slice element
// ... error handling
}
The disadvantage of this is that since you store pointers to the slice elements, the whole backing array of the slice would have to be kept in memory for as long as you keep any of the pointers (the array cannot be garbage collected). Moreover, the pointers you store would share the same Chart values as the slice, so if someone would modify a chart value of the passed slice, that would effect the charts whose pointers you stored.
See related questions:
Golang: Register multiple routes using range for loop slices/map
Why do these two for loop variations give me different behavior?

I faced a similar issue today and creating this simple example helped me understand the problem.
// Input array of string values
inputList := []string {"1", "2", "3"}
// instantiate empty list
outputList := make([]*string, 0)
for _, value := range inputList {
// print memory address on each iteration
fmt.Printf("address of %v: %v\n", value, &value)
outputList = append(outputList, &value)
}
// show memory address of all variables
fmt.Printf("%v", outputList)
This printed out:
address of 1: 0xc00008e1e0
address of 2: 0xc00008e1e0
address of 3: 0xc00008e1e0
[0xc00008e1e0 0xc00008e1e0 0xc00008e1e0]
As you can see, the address of value in each iteration was always the same even though the actual value was different ("1", "2", and "3"). This is because value was getting reassigned.
In the end, every value in the outputList was pointing to the same address which is now storing the value "3".

Related

Duplicate slice of pointers to structs with slightly different values

Given two nested types
type Inner struct {
InnerVal int
}
type Outer struct {
InnerStruct *Inner
OuterVal int
}
I need to duplicate a slice of pointers to Outer
originalSlice := []*Outer{<plenty_of_items>}
with itself, but having updated field values in the duplicates, including the Outer.InnerStruct.InnerVal.
To do so I create a new slice of the same type and length as originalSlice, append pointers to newly created structs with altered values to it, and finally append these items to the originalSlice
duplicateSlice := make([]*Outer, len(originalSlice))
for _, originalItem := range originalSlice {
duplicateSlice = append(duplicateSlice, &Outer{
InnerStruct: &Inner{
InnerVal: originalItem.InnerStruct.InnerVal + 1
},
OuterVal: originalItem.OuterVal + 1,
})
}
originalSlice = append(originalSlice, duplicateSlice...)
While this is verbose enough to follow the pointers around, or so I thought, when passed to a function right after as nowDoubledSlice, and accessed via loop
someOtherSlice := make([]*types.Inner, len(nowDoubledSlice))
for i, doubledItem := range nowDoubledSlice {
someOtherSlice[i] = doubledItem.InnerStruct
}
I get a
runtime error: invalid memory address or nil pointer dereference
Why is that? And is there a more elegant or idiomatic way to duplicate a slice of pointers to structs, while altering the duplicates' fields?
It's nothing to do with your pointer creation, it's your slice allocation. This line:
duplicateSlice := make([]*Outer, len(originalSlice))
Creates a new slice of length len(originalSlice), filled with zero-value elements. What you likely want instead is:
duplicateSlice := make([]*Outer, 0, len(originalSlice))
to create a slice of length 0 but capacity of len(originalSlice). This works fine, as you can see here.
Alternatively, you could keep make([]*Outer, len(originalSlice)) and use indexing instead of append in your loop:
for i, originalItem := range originalSlice {
duplicateSlice[i] =&Outer{
InnerStruct: &Inner{
InnerVal: originalItem.InnerStruct.InnerVal + 1,
},
OuterVal: originalItem.OuterVal + 1,
}
}
Which works just as well, as you can see here.

How to modify field of a struct in a slice?

I have a JSON file named test.json which contains:
[
{
"name" : "john",
"interests" : ["hockey", "jockey"]
},
{
"name" : "lima",
"interests" : ["eating", "poker"]
}
]
Now I have written a golang script which reads the JSON file to an slice of structs, and then upon a condition check, modifies a struct fields by iterating over the slice.
Here is what I've tried so far:
package main
import (
"log"
"strings"
"io/ioutil"
"encoding/json"
)
type subDB struct {
Name string `json:"name"`
Interests []string `json:"interests"`
}
var dbUpdate []subDB
func getJSON() {
// open the file
filename := "test.json"
val, err := ioutil.ReadFile(filename)
if err != nil {
log.Fatal(err)
}
err = json.Unmarshal(val, &dbUpdate)
}
func (v *subDB) Change(newresponse []string) {
v.Interests = newresponse
}
func updater(name string, newinterest string) {
// iterating over the slice of structs
for _, item := range dbUpdate {
// checking if name supplied matches to the current struct
if strings.Contains(item.Name, name) {
flag := false // declare a flag variable
// item.Interests is a slice, so we iterate over it
for _, intr := range item.Interests {
// check if newinterest is within any one of slice value
if strings.Contains(intr, newinterest) {
flag = true
break // if we find one, we terminate the loop
}
}
// if flag is false, then we change the Interests field
// of the current struct
if !flag {
// Interests holds a slice of strings
item.Change([]string{newinterest}) // passing a slice of string
}
}
}
}
func main() {
getJSON()
updater("lima", "jogging")
log.Printf("%+v\n", dbUpdate)
}
The output I'm getting is:
[{Name:john Interests:[hockey jockey]} {Name:lima Interests:[eating poker]}]
However I should be getting an output like:
[{Name:john Interests:[hockey jockey]} {Name:lima Interests:[jogging]}]
My understanding was that since Change() has a pointer passed, it should directly modify the field. Can anyone point me out what I'm doing wrong?
The problem
Let's cite what the language specification says on the for ... range loops:
A "for" statement with a "range" clause iterates through all entries
of an array, slice, string or map, or values received on a channel.
For each entry it assigns iteration values to corresponding iteration
variables if present and then executes the block.
So, in
for _, item := range dbUpdate { ... }
the whole statement forms a scope in which a variable named item is declared and it gets assigned a value of each element of dbUpdate, in turn, form the first to the last — as the statement performs its iterations.
All assignments in Go, always and everywhere do copy the value of the expression being assigned, into a variable receiving that value.
So, when you have
type subDB struct {
Name string `json:"name"`
Interests []string `json:"interests"`
}
var dbUpdate []subDB
you have a slice whose backing array contains a set of elements, each of which has type subDB.
Consequently, when for ... range iterates over your slice, on each iteration a shallow copy of the fields of a subDB value contained in the current slice element is done: the values of those fields are copied into the variable item.
We could re-write what happes as this:
for i := 0; i < len(dbUpdate); i++ {
var item subDB
item = dbUpdate[i]
...
}
As you can see, if you mutate item in the loop's body, the changes you do to it do not in any way affect the collection's element currently being iterated over.
The solutions
Broadly speaking, the solution is to become fully acquainted with the fact that Go is very simple in most of the stuff it implements, and so range is no magic to: the iteration variable is just a variable, and assignment to it is just an assignment.
As to solving the particular problem, there are multiple ways.
Refer to a collection element by its index
Do
for i := range dbUpdate {
dbUpdate[i].FieldName = value
}
A corollary to this is that sometimes, when the element is complex or you'd like to delegate its mutation to some function, you may take a pointer to it:
for i := range dbUpdate {
p := &dbUpdate[i]
mutateSubDB(p)
}
...
func mutateSubDB(p *subDB) {
p.SomeField = someValue
}
Keep pointers in the slice
If your slice were declated like
var dbUpdates []*subDB
…and you'd keep pointers to (usually heap-allocated) SubDB values,
the
for _, ptr := range dbUpdate { ... }
statement would naturally copy a pointer to a SubDB (anonymous) variable into ptr as the slice contains pointers and so the assignment copies a pointer.
Since all pointers containing the same address are pointing to the same value, mutating the target variable through the pointer kept in the iteration variable would mutate the same thing which is pointed to by the slice's element.
Which approach to select should usually depend on considerations other than thinking about how one would iterate over the elements — simply because once you understand why your code did not work, you do not have this problem anymore.
As usually: if your values are really big, consider keeping pointers to them.
If you values need to be referenced from multiple places at the same time, keep pointers to them. In other cases keep the values directly — this greatly improves CPU data cache locality (simply put, by the time you're about to access the next element its contents will most likely have been already fetched from the memory, which does not occur when the CPU has to chase a pointer to access some arbitrary memory location through it).

Implicit memory aliasing in for loop

I'm using golangci-lint and I'm getting an error on the following code:
versions []ObjectDescription
... (populate versions) ...
for i, v := range versions {
res := createWorkerFor(&v)
...
}
the error is:
G601: Implicit memory aliasing in for loop. (gosec)
res := createWorkerFor(&v)
^
What does "implicit memory aliasing in for loop" mean, exactly? I could not find any error description in the golangci-lint documentation. I don't understand this error.
The warning means, in short, that you are taking the address of a loop variable.
This happens because in for statements the iteration variable(s) is reused. At each iteration, the value of the next element in the range expression is assigned to the iteration variable; v doesn't change, only its value changes. Hence, the expression &v is referring to the same location in memory.
The following code prints the same memory address four times:
for _, n := range []int{1, 2, 3, 4} {
fmt.Printf("%p\n", &n)
}
When you store the address of the iteration variable, or when you use it in a closure inside the loop, by the time you dereference the pointer, its value might have changed. Static analysis tools will detect this and emit the warning you see.
Common ways to prevent the issue are:
index the ranged slice/array/map. This takes the address of the actual element at i-th position, instead of the iteration variable
for i := range versions {
res := createWorkerFor(&versions[i])
}
reassign the iteration variable inside the loop
for _, v := range versions {
v := v
res := createWorkerFor(&v) // this is now the address of the inner v
}
with closures, pass the iteration variable as argument to the closure
for _, v := range versions {
go func(arg ObjectDescription) {
x := &arg // safe
}(v)
}
In case you dereference sequentially within the loop and you know for sure that nothing is leaking the pointer, you might get away with ignoring this check. However the job of the linter is precisely to report code patterns that could cause issues, so it's a good idea to fix it anyway.
Indexing will solve the problem:
for i := range versions {
res := createWorkerFor(&versions[i])
...
}

Difference in behavior between slices and maps

A related questions is here https://stackoverflow.com/a/12965872/6421681.
In go, you can do:
func numsInFactorial(n int) (nums []int) {
// `nums := make([]int)` is not needed
for i := 1; i <= n; i++ {
nums = append(nums, i)
}
return
}
However,the following doesn't work:
func mapWithOneKeyAndValue(k int, v int) (m map[int]int) {
m[k] = v
return
}
An error is thrown:
panic: assignment to entry in nil map
Instead, you must:
func mapWithOneKeyAndValue(k int, v int) map[int]int {
m := make(map[int]int)
m[k] = v
return
}
I can't find the documentation for this behavior.
I have read through all of effective go, and there's no mention of it there either.
I know that named return values are defined (i.e. memory is allocated; close to what new does) but not initialized (so make behavior isn't replicated).
After some experimenting, I believe this behavior can be reduced into understanding the behavior of the following code:
func main() {
var s []int // len and cap are both 0
var m map[int]int
fmt.Println(s) // works... prints an empty slice
fmt.Println(m) // works... prints an empty map
s = append(s, 10) // returns a new slice, so underlying array gets allocated
fmt.Println(s) // works... prints [10]
m[10] = 10 // program crashes, with "assignment to entry in nil map"
fmt.Println(m)
}
The issue seems that append likely calls make and allocates a new slice detecting that the capacity of s is 0. However, map never gets an explicit initialization.
The reason for this SO question is two-pronged. First, I would like to document the behavior on SO. Second, why would the language allow non-initializing definitions of slice and map? With my experience with go so far, it seems to be a pragmatic language (i.e. unused variables lead to compilation failure, gofmt forces proper formatting), so it would make sense for it to prevent the code from compiling.
Try to assign in nil slice by index - you will get "panic: runtime error: index out of range" (example: https://play.golang.org/p/-XHh1jNyn5g)
The only reason why append function works with nil, is that append function can do reallocation for the given slice.
For example, if you trying to to append 6th element to slice of 5 elements with current capacity 5, it will create the new array with new capacity, copy all the info from old one, and swap the data array pointers in the given slice. In my understanding, it is just golang implementation of dynamic arrays.
So, the nil slice is just a special case of slice with not enough capacity, so it would be reallocated on any append operation.
More details on https://blog.golang.org/go-slices-usage-and-internals
From https://blog.golang.org/go-maps-in-action
A nil map behaves like an empty map when reading, but attempts to write to a nil map will cause a runtime panic; don't do that. To initialize a map, use the built in make function
It seems like a nil map is considered a valid empty map and that's the reason they don't allocate memory for it automatically.

How to understand this behavior of goroutine?

package main
import (
"fmt"
"time"
)
type field struct {
name string
}
func (p *field) print() {
fmt.Println(p.name)
}
func main() {
data := []field{ {"one"},{"two"},{"three"} }
for _,v := range data {
go v.print()
}
<-time.After(1 * time.Second)
}
why does this code print 3 "three" instead of "one" "two" "three" in any order?
There is a data race.
The code implicitly takes address of variable v when evaluating arguments to the goroutine function. Note that the call v.print() is shorthand for the call (&v).print().
The loop changes the value of variable v.
When goroutines execute, it so happens that v has the last value of the loop. That's not guaranteed. It could execute as you expected.
It's helpful and easy to run programs with the race detector. This data race is detected and reported by the detector.
One fix is to create another variable scoped to the inside of the loop:
for _, v := range data {
v := v // short variable declaration of new variable `v`.
go v.print()
}
With this change, the address of the inner variable v is taken when evaluating the arguments to the goroutine. There is a unique inner variable v for each iteration of the loop.
Yet another way to fix the problem is use a slice of pointers:
data := []*field{ {"one"},{"two"},{"three"} } // note '*'
for _, v := range data {
go v.print()
}
With this change, the individual pointers in the slice are passed to the goroutine, not the address of the range variable v.
Another fix is to use the address of the slice element:
data := []field{ {"one"},{"two"},{"three"} } // note '*'
for i:= range data {
v := &data[i]
go v.print()
}
Because pointer values are typically used with types having a pointer receiver, this subtle issue does not come up often in practice. Because field has a pointer receiver, it would be typical to use []*field instead of []field for the type of data in the question.
If the goroutine function is in an anonymous function, then a common approach for avoiding the issue is to pass the range variables as an argument to the anonymous function:
for _, v := range data {
go func(v field) {
v.print() // take address of argument v, not range variable v.
}(v)
}
Because the code in the question does not already use an anonymous function for the goroutine, the first approach used in this answer is simpler.
As stated above there’s a race condition it’s result depends on delays on different processes and not well defined and predictable.
For example if you add time.Sleep(1*time.Seconds) you likely to get a correct result. Because usually goroutine prints faster than 1second and will have correct variable v but it’s a very bad way.
Golang has a special race detector tool which helps to find such situations. I recommend read about it while reading testing. Definitely it’s worth it.
There’s another way - explicitly pass variable value at goroutine start:
for _, v := range data {
go func(iv field) {
iv.print()
}(v)
}
Here v will be copied to iv (“internal v”) on every iteration and each goroutine will use correct value.

Resources