What happens to pointer to element when Go moves slice to another place in memory? - go

I have the following code
package main
import "fmt"
func main() {
a := []int{1}
b := &a[0]
fmt.Println(a, &a[0], b, *b) // prints [1] 0xc00001c030 0xc00001c030 1
a = append(a, 1, 2, 3)
fmt.Println(a, &a[0], b, *b) // prints [1 1 2 3] 0xc000100020 0xc00001c030 1
}
First it creates a slice of 1 int. Its len is 1 and cap is also 1. Then I take a pointer to its first element and get the underlying pointer value in print. It works fine, as expected.
Than I add 3 more elements to the slice making go expand the capacity of the slice, thus copying it to another place in memory. After that I print the address (by taking a pointer) of the slice's first element which is now different from address stored in b.
However, when I then print the underlying value of b it also works fine. I don't understand why does it work. As far as I know the slice to which first element b points to was copied to another place in memory, so its previous memory must have been released. However, it seems to still be there.
If we look on maps, golang doesn't even allow us to create pointers on element by key because of the exact same problem - underlying data can be moved to another place in memory. However, it works perfectly fine with slices. Why is it so? How does it really works? Is the memory not being freed because there still is a variable which points to this memory? How is it different from maps?

What happens to pointer to element when Go moves slice to another place in memory?
Nothing.
[W]hen I then print the underlying value of b it also works fine. I don't understand why does it work.
Why wouldn't it work?
The memory location originally pointed to is still there, unaltered. And as long as anything (such as b) still references it, it will remain usable. Once all references to that memory are removed (i.e. go out of scope), then the garbage collector may allow it to be used by something else.

What happens to pointer to element when Go moves slice to another
place in memory?
I believe the current GC implementation do not move such objects at all, although the specifications allow for this to happen. Unless you use the "unsafe" package, you are unlikely to encounter any problems even if it did move the underlying data structure.

Related

Is it correct to use slice as *[]Item, because Slice is by default pointer

What is the right way to use slice in Go. As per Go documentation slice is by default pointer, so is creating slice as *[]Item is the right way?. Since slice are by default pointer isn't this way of creating the slice making it pointer to a pointer.
I feel the right way to create slice is []Item or []*item (slice holding pointers of items)
A bit of theory
Your question has no sense: there's no "right" or "wrong" or "correct" and "incorrect": you can have a pointer to a slice, and you can have a pointer to a pointer to a slice, and you can add levels of such indirection endlessly.
What to do depends on what you need in a particular case.
To help you with the reasoning, I'll try to provide a couple of facts and draw some conclusions.
The first two things to understand about types and values in Go are:
Everything in Go, ever, always, is passed by value.
This means variable assignments (= and :=), passing values to function and method calls, and copying memory which happens internally such as when reallocating backing arrays of slices or rebalancing maps.
Passing by value means that actual bits of the value which is assigned are physically copied into the variable which "receives" the value.
Types in Go—both built-in and user-defined (including those defined in the standard library)—can have value semantics and reference semantics when it comes to assignment.
This one is a bit tricky, and often leads to novices incorrectly assuming that the first rule explained above does not hold.
"The trick" is that if a type contains a pointer (an adderss of a variable) or consists of a single pointer, the value of this pointer is copied when the value of the type is copied.
What does this mean?
Pretty simple: if you assign the value of a variable of type int to another variable of type int, both variables contain identical bits but they are completely independent: change the content of any of them, and another will be unaffected.
If you assign a variable containing a pointer (or consisting of a single pointer) to another one, they both, again, will contain identical bits and are independent in the sense that changing those bits in any of them will not affect the other.
But since the pointer in both these variables contains the address of the same memory location, using those pointers to modify the contents of the memory location they point at will modify the same memory.
In other words, the difference is that an int does not reference anything while a pointer naturally references another memory location—because it contains its address.
Hence, if a type contains at least a single pointer (it may do so by containing a field of another type which itself contains a pointer, and so on—to any nesting level), values of this type will have reference assignment semantics: if you assign a value to another variable, you end up with two values referencing the same memory location.
That is why maps, slices and strings have reference semantics: when you assign variables of these types both variables point to the same underlying memory location.
Let's move on to slices.
Slices vs pointers to slices
A slice, logically, is a struct of three fields: a pointer to the slice's backing array which actually contains the slice's elements, and two ints: the capacity of the slice and its length.
When you pass around and assign a slice value, these struct values are copied: a pointer and two integers.
As you can see, when you pass a slice value around the backing array is not copied—only a pointer to it.
Now let's consider when you want to use a plain slice or a pointer to a slice.
If you're concerned with performance (memory allocation and/or CPU cycles needed to copy memory), these concerns are unfounded: copying three integers when passing around a slice is dirt-cheap on today's hardware.
Using a pointer to a slice would make copying a tiny bit faster—a single integer rather than three—but these savings will be easily offset by two facts:
The slice's value will almost certainly end up being allocated on the heap so that the compiler can be sure its value will survive crossing boundaries of the function calls—so you will pay for using the memory manager and the garbage collector will have more work.
Using a level of indirection reduces data locality: accessing RAM is slow so CPUs have caches which prefetch data at the addresses following the one at which the data is being read. If the control flow immediately reads memory at another location, the prefetched data is thrown away: cache trashing.
OK, so is there a case when you would want a pointer to a slice?
Yes. For instance, the built-in append function could have been defined as
func append(*[]T, T...)
instead of
func append([]T, T...) []T
(N.B. the T here actually means "any type" because append is not a library fuction and cannot be sensibly defined in plain Go; so it's sort of pseudocode.)
That is, it could accept a pointer to a slice and possibly replace the slice pointed to by the pointer, so you'd call it as append(&slice, element) and not as slice = append(slice, element).
But honestly, in real-world Go projects I have dealt with, the only case of using pointers to slices which I can remember was about pooling slices which are heavily reused—to save on memory reallocations. And that sole case was only due to sync.Pool keeping elements of type interface{} which may be more effective when using pointers¹.
Slices of values vs slices of pointers to values
Exactly the same logic described above applies to the reasoning about this case.
When you put a value in a slice that value is copied. When the slice needs to grow its backing array, the array will be reallocated, and reallocation means physically copying all existing elements into the new memory location.
So, two considerations:
Are elements reasonably small so that copying them is not going to press on memory and CPU resources?
(Note that "small" vs "big" also heavily depens on the frequency of such copying in a working program: copying a couple of megabytes once in a while is not a big deal; copying even tens of kilobytes in a tight time-critical loop can be a big deal.)
Are you program OK with multiple copies of the same data (for instance, values of certain types like sync.Mutex must not be copied after first use)?
If the answer to either question is "no", you should consider keeping pointers in the slice. But when you consider keeping pointers, also think about data locality explained above: if a slice contains data intended for time-critical number-crunching, it's better not have the CPU to chase pointers.
To recap: when you ask about a "correct" or "right" way of doing something, the question has no sense without specifying the set of criteria according to which we could classify all possible solutions to a problem. Still, there are considerations which must be performed when designing the way you're going to store and manipulate data, and I have tried to explain these considerations.
In general, a rule of thumb regarding slices could be:
Slices are designed to be passed around "as is"—as values, not pointers to variables containing their values.
There are legitimate reasons to have pointers to slices, though.
Most of the time you keep values in the slice's elements, not pointers to variables with these values.
Exceptions to this general rule:
Values you intend to store in a slice occupy too much space so that it looks like the envisioned pattern of using slices of them would involve excessive memory pressure.
Types of values you intend to store in a slice require they must not be copied but rather only referenced, existing as a single instance each. A good example are types containing/embedding a field of type sync.Mutex (or, actually, a variable of any other type from the sync package except those which itself have reference semantics such as sync.Pool): if you lock a mutex, copy its value and then unlock the copy, the initially locked copy won't notice, which means you have a grave bug in your code.
A note of caution on correctness vs performance
The text above contains a lot of performance considerations.
I've presented them because Go is a reasonably low-level language: not that low-level as C and C++ and Rust but still providing the programmer with plenty of wiggle-room to use when performance is at stake.
Still, you should very well understand that at this point on your learning curve, correctness must be your top—if not the sole—objective: please take no offence, but if you were after tuning some Go code to shave off some CPU time to execute it, you weren't asking your question in the first place.
In other words, please consider all of the above as a set of facts and considerations to guilde you in your learning and exploration of the subject but do not fall into the trap of trying to think about performance first. Make your programs correct and easy to read and modify.
¹ An interface value is a pair of pointers: to the variable containing the value you have put into the interface value and to a special data structure inside the Go runtime which describes the type of that variable.
So while you can put a slice value into a variable of type interface{} directly—in the sense that it's perfectly fine in the language—if the value's type is not itself a single pointer, the compiler will have to allocate on the heap a variable to contain a copy of your value there, and store a pointer to that new variable into the value of type interface{}.
This is needed to hold that "everything is always passed by value" semantics of the Go assignments.
Consequently, if you put a slice value into a variable of type interface{}, you will end up with a copy of that value on the heap.
Because of this, keeping pointers to slices in data structures such as sync.Map makes code uglier but results in lesser memory churn.

for loop value semantic in golang

First question about Go in SO. The code below shows, n has the same address in each iteration. I am aware that such a for loop is called value semantic by some people and what's actually ranged over is a copy of the slice not the actual slice itself. Why does n in each iteration has the same address? Is it because each element in the slice is copied rather than the whole slice is copied once beforehand. If only each element from the original slice is copied, then a single memory address can be reused in each iteration?
package main
import (
"fmt"
)
func main() {
numbers := []int{1, 2}
for i, n := range numbers {
fmt.Println(&n, &numbers[i])
}
}
A sample result from go playground:
0xc000122030 0xc000122020
0xc000122030 0xc000122028
You are slightly wrong in your question, it is not a copy of the slice that is being iterated over. In Go when you pass a slice you really pass a pointer to memory and the size and capacity of that memory, this is called a slice header. The header is copied, but the copy points to the same underlying memory, meaning that when you pass a []int to a function, change the values in that function, the values will be changed in the original []int in the outer code as well.
This is in contrast to an array like [5]int which is passed by value, meaninig this would really be copied when you pass it around. In Go structs, strings, numbers and arrays are passed by value. Slices are really also passed by value but as described above, the value in this case contains a pointer to memory. Passing a copy of a pointer still lets you change the memory pointed to.
Now to your experiment:
for i, n := range numbers
will create two variables before the loop starts: integers i and n. In each loop iteration i will be incremented by 1 and n will be assigned the value (a copy of the integer value that is) of numbers[i].
This means there really are only two variables i and n. They are the same which is what you see in your output.
The addresses of numbers[i] are different of course, they are the memory addresses of the items in the array.
The Go Wiki has a Common Mistakes page talking about this exact issue. It also provides an explanation of how to avoid this issue in real code. The quick answer is that this is done for efficiency, and has little to do with the slice. n is a single variable / memory location that gets assigned a new value on each iteration.
If you want additional insight into why this happens under the hood, take a look at this post.

How does gc handle slice memory reclaim

var a = [...]int{1,2,3,4,5,6}
s1 := a[2:4:5]
Suppose s1 goes out of scope later than a. How does gc know to reclaim the memory of s1's underlying array a?
Consider the runtime representation of s1, spec
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
The GC doesn't even know about the beginning of a.
Go uses mark-and-sweep collector as it's present implementation.
As per the algorithm, there will be one root object, and the rest is tree like structure, in case of multi-core machines gc runs along with the program on one core.
gc will traverse the tree and when something is not reachable it, considers it as free.
Go objects also have metadata for objects as stated in this post.
An excerpt:
We needed to have some information about the objects since we didn't have headers. Mark bits are kept on the side and used for marking as well as allocation. Each word has 2 bits associated with it to tell you if it was a scalar or a pointer inside that word. It also encoded whether there were more pointers in the object so we could stop scanning objects sooner than later.
The reason go's slices (slice header) were structures instead of pointer to structures is documented by russ cox in this page under slice section.
This is an excerpt:
Go originally represented a slice as a pointer to the structure(slice header) , but doing so meant that every slice operation allocated a new memory object. Even with a fast allocator, that creates a lot of unnecessary work for the garbage collector, and we found that, as was the case with strings, programs avoided slicing operations in favor of passing explicit indices. Removing the indirection and the allocation made slices cheap enough to avoid passing explicit indices in most cases.
The size(length) of an array is part of its type. The types [1]int and [2]int are distinct.
One thing to remember is go is value oriented language, instead of storing pointers, they store direct values.
[3]int, arrays are values in go, so if you pass an array, it copies the whole array.
[3]int this is a value (one as a whole).
When one does a[1] you are accessing part of the value.
SliceHeader Data field says consider this as base point of array, instead of a[0]
As far as my knowledge is considered:
When one requests for a[4],
a[0]+(sizeof(type)*4)
is calculated.
Now if you are accessing something in through slice s = a[2:4],
and if one requests for s[1], what one was requesting is,
a[2]+sizeof(type)*1

What is the order that things are happening?

I started playing around with Go by going through the Tour of Go on the official site.
I only have basic experience with programming but on coming up to the channels page I started to play around to try get my head around it and I've ended up quite confused.
This is what I have the code as:
package main
import "fmt"
func sum(s []int, c chan int) {
sum := 0
s[0] = 8
s = append(s, 20)
fmt.Println(s)
for _, v := range s {
sum += v
}
c <- sum // send sum to c
}
func main() {
s := []int{7, 2, 8, -9, 4, 0}
c := make(chan int)
go sum(s[:len(s)/2], c)
fmt.Println(s[0])
go sum(s[len(s)/2:], c)
fmt.Println(s)
x, y := <-c, <-c // receive from c
fmt.Println(x, y, x+y)
fmt.Println(s)
}
And this is the result I get:
7
[8 2 8 20 4 0]
[8 2 8 20]
[8 4 0 20]
26 32 58
[8 2 8 8 4 0]
I get that in creating a slice you get an underlying array with needed number underneath and passes a slice to a function and modifying an element modifying the underlying array but I'm not sure in what sort of order the goroutines are going.
It prints out the first S[0] as 7 even though the function call before it should have modified it to 8, so I'm assuming that the goroutines hasn't run yet?
Then printing the entire array after the second goroutine function call prints it with all the modifications made during the first function call (modifying the first item to 8 and appending 20).
Yet the next 2 line print outs are the print outs of the slice segments from within the functions, which logically by the fact that I just said I saw the modifications done by the function call means they should have printed out BEFORE that line.
Then I'm not sure how it got the calculations or the final printout of the array.
Can someone familiar with how Go operates explain what the logical progression of this code was?
It prints out the first S[0] as 7 even though the function call before it should have modified it to 8, so I'm assuming that the goroutines hasn't run yet?
Yes, this interpretation is correct. (But should not be made, see next section).
Then printing the entire array after the second goroutine function call prints it with all the modifications made during the first function call (modifying the first item to 8 and appending 20).
NO. Never ever think along these lines.
The state of your application at this point is not well defined. 1) You started goroutines which modify the state and you have absolutely no synchronisation between the three goroutines (the main one and the two you started manually). 2) Your code is racy which means it is a) wrong, racy programs are never correct and b) results in undefined behaviour.
Yet the next 2 line print outs are the print outs of the slice segments from within the functions, which logically by the fact that I just said I saw the modifications done by the function call means they should have printed out BEFORE that line.
More or less, if your program would not be racy (see above).
Where is your code racy: the main.s is a slice with a backing array and you subslice it twice when starting the goroutines with go sum(...). Both subslices share the same backing array. Now inside the goroutines you write ( s[0]=8) and append (append(s, 20)) to this subslice. From both goroutines without synchronisation. The first goroutine will fulfill this append using the large enough backing array which is concurrently used by the second goroutine.
This results in concurrent write without synchronisation to the fourth element of main.s which is racy and thus undefined.
With reading twice from the channel (<-c, <-c) you introduce synchronisation between the two manually started goroutines and the main goroutine and your program returns to serial execution (but its state is still not welldefined as it is racy).
The Println statements you use to track what is going on are problematic too: These statements print stuff which is in a not well defined state because it is accessed concurrently from different goroutines; both may print arbitrary data.
Summing up: You program is racy, its output is undefined, there is no "reason" or "explanation" why it prints what it prints, its pure chance.
Always try your code under the race detector (-race) and do not try to make up pseudo-explanations for racy code.
How to fix:
The divide and conquer approach is nice and okay. What is dangerous and thus easy to get wrong (as it happened to you) is modifying the subslice inside the goroutine. While s[0]=9] is harmless in your case the append interferes badly with the shared backing array.
Do not modify s inside the goroutine. If you must: Make sure you have a new backing array.
(Do not try to mix slices and concurrency. Both have some subtleties, slices the shared backing array and concurrency the problem of undefined racy behaviour. Especially if you are new to all this. Learn one, then the other, then combine both.)

Is it safe to hold only unsafe.Pointer on the first element of slice and no refs to that slice itself?

package main
import (
"fmt"
"unsafe"
"runtime"
)
func getPoi() unsafe.Pointer {
var a = []int{1, 2, 3}
return unsafe.Pointer(&a[0])
}
func main() {
p := getPoi()
runtime.GC()
fmt.Println("Hello, playground %v\n", *(*int)(unsafe.Pointer(uintptr(p)+8)))
}
output: 3
https://play.golang.org/p/-OQl7KeL9a
Just examining abilities of unsafe pointers, trying to minimize memory overhead of slice structure (12 byte)
I wonder if this example correct or not.
And if not, what will go wrong exactly after such actions. if it's not correct, why the value is still available even after an explicit call to GC ?
Is there any aproach to reach minimum overhead on storage like 'slice of slices', as it would be in C (just array of pointers to allocated arrays, when overhead on each row is sizeof(int*)).
It's possible, that by coincidence, this will work out for you but I would regard it as unsafe and unsupported. The problem is, if the slice grows beyond it's capacity and needs to be reallocated, what happens to your pointer? Really if you want to optimize performance, you should be using an array. On top of it's performance being inherently better and it's memory footprint being smaller, this operation would always be safe.
Also, just generally speaking I see people doing all kinds of stupid things to try and improve performance when their design is inherently poor (like using dynamic arrays or linked lists for no reason). If you need the dynamic growth of a slice then an array isn't really an option (and using that pointer is also most likely unsafe) but in many cases developers just fail to size their collection appropriately out of idk, laziness? I assume your example is contrived but in such a scenario you have no reason to ever use a slice since your collections size is known at compile time. Even if it weren't, often times the size can be determined during runtime in advance of the allocation and people just fail to do it for the convenience of using abstracted dynamically sized collections.

Resources