Why golang slices internal designed like this? - go

Code:
func main() {
a := []int{1, 2}
printSlice("a", a)
b := a[0:1]
printSlice("b origin", b)
b = append(b, 9)
printSlice("b after append b without growing capacity", b)
printSlice("a after append b without growing capacity", a)
b = append(b, 5, 7, 8)
printSlice("a after append b with grown capacity", a)
printSlice("b after append b with grown capacity", b)
b[0] = 1000
printSlice("b", b)
printSlice("a", a)
}
func printSlice(s string, x []int) {
fmt.Printf("%s len=%d cap=%d %v\n",
s, len(x), cap(x), x)
}
Output:
a len=2 cap=2 [1 2]
b origin len=1 cap=2 [1]
b after append b without growing capacity len=2 cap=2 [1 9]
a after append b without growing capacity len=2 cap=2 [1 9]
a after append b with grown capacity len=2 cap=2 [1 9]
b after append b with grown capacity len=5 cap=6 [1 9 5 7 8]
b len=5 cap=6 [1000 9 5 7 8]
a len=2 cap=2 [1 9]
The interesting thing is at the last two printed lines. I already know that a slice is just a window of underlying array. When reslicing it within is capacity, then the two slices share the same underlying array, but When I reslice it to grow beyond its capaccity, the two slices have distinct underlying array. But why golang designers choose not to change the underlying array of the origin slice to the underlying array of the new slice, so as to make both slices still have the same underlying array? In current state when I changed the value of some elements of newly resliced slice I have to check if I changed the underlying array to decide if this operation have side effects on other slices backed up by it(see the last two lines of Output). I think it's awkward.

But why golang designers choose not to change the underlying array of the origin slice to the underlying array of the new slice, so as to make both slices still have the same underlying array?
Mainly, slices of the same array can appear absolutely anywhere in the program--completely different functions, packages, and so on. Given how slices are laid out in memory, Go would have to "find" all slices sharing the array to update them; it has no way to.
The approach of some other array-list implementations (like Python lists) is that what you pass around is really a pointer to something like a Go slice, and if two variables hold "the same list", an append using one variable will also show up when you look at the other. That also has some efficiency cost--another pointer lookup to do a[0]. In those circumstances where you really need an append over here to act as an append over there, you can use pointers to slices.
Pointers to slices give you aliasing if you want it but don't provide subslicing--to get everything you ask for, you'd need a different arrangement that I can't think of an example of from in the wild (offset, length, and pointer to struct { capacity int; firstElem *type }).

Related

Golang append changing append parameter [duplicate]

This question already has answers here:
Why does append modify passed slice
(5 answers)
Closed 10 months ago.
When running the following program:
package main
import "fmt"
func main() {
edges := [][]int{{1,2}, {2,3}, {3,4}, {1,4}, {1,5}}
printSlice2d(edges)
_ = append(edges[:0], edges[1:]...)
printSlice2d(edges)
}
func printSlice2d(s [][]int) {
fmt.Printf("len=%d cap=%d %v\n", len(s), cap(s), s)
}
I get the following output:
len=5 cap=5 [[1 2] [2 3] [3 4] [1 4] [1 5]]
len=5 cap=5 [[2 3] [3 4] [1 4] [1 5] [1 5]]
I don't understand why edges is being changed by calling append. I would expect the first and second lines to be the same. I've checked the specification, but I can't seem to find anything that would explain this behaviour this kind of input.
Slice uses array internally to store the data.
append function does not create a new array if it is not necessary to increase array size beyond its current capacity. Instead it copies new data elements into existing array. After that function returns reference to a new slice that internally uses the same array.
You can read more in this article - https://blog.golang.org/go-slices-usage-and-internals
As mentioned earlier, re-slicing a slice doesn't make a copy of the
underlying array. The full array will be kept in memory until it is no
longer referenced. Occasionally this can cause the program to hold all
the data in memory when only a small piece of it is needed.
This is what is going on in this line:
_ = append(edges[:0], edges[1:]...)
second append function argument (edges[1:]...) copies 4 items of edges into temp var. Its value - [{2,3}, {3,4}, {1,4}, {1,5}]
these values are copied into array that edges uses internally to store data. That overrides all items except of last one. This is where edges is mutated.
append returns reference to a new slice that internally uses the same array to store the data as edges
returned slice is ignored and will be garbage collected, but that does not matter for edges.
That is why you see changed values when you check edges after performing append on it.
edges[:0] is a slice of length 0 starting at index 0 of the underlying array of edges.
To this slice, you append another slice, the slice of length 4 starting at index of the underlying array ofedges`. That gives you the first 4 elements of the result you see in the second line.
Then you print edges, which is a slice with an underlying array of 5 elements, whose last 4 you just shifted one element lower. The last element is duplicated. in the edges array.
If you look at the result of the append, then you'd see a slice with length 4, cap 5, the first 4 elements of the underlying array of edges.
If you expected the two lines to be the same, maybe you tried to do:
append(edges[:1],edges[1:]...)

Slicing a sliced reference

I'm taking the tour on Golang site, and I'm trying to digest one of the examples. It is unclear how it works:
package main
import "fmt"
func main() {
s := []int{2, 3, 5, 7, 11, 13}
printSlice(s)
// Slice the slice to give it zero length.
s = s[:0]
printSlice(s)
// Extend its length.
s = s[:4]
printSlice(s)
// Drop its first two values.
s = s[2:]
printSlice(s)
}
func printSlice(s []int) {
fmt.Printf("len=%d cap=%d %v\n", len(s), cap(s), s)
}
The output is:
len=6 cap=6 [2 3 5 7 11 13]
len=0 cap=6 []
len=4 cap=6 [2 3 5 7]
len=2 cap=4 [5 7]
After the first slice, s = s[:0] the slice length is 0. Then there is another slicing of s = s[:4]. Although the length is 0, this seems to work. But how this happens? Shouldn't the underlaying array be in accessible from s?
What confuses me more is, the next time we slice it, s = s[2:] we slice the old value of s (which is 4 elements) and not the original array.
Can someone shed some lights what is the difference between the two cases?
A slice is basically a pointer to memory with some additional information:
1) the number of elements currently used and
2) the capacity, i.e. the remaining length it can occupy.
At the start we create a slice with 6 integers, this makes go create the underlying int array with a total size of 6 as well.
here is your memory locations with addresses (content does not matter here)
* * * * * *
[0][1][2][3][4][5]
^
s points to the start of the memory
len(s) = 6
cap(s) = 6
Next we say: make this slice's len be 0, this is the s = s[:0] which takes a sub-slice of s at position 0 with length 0. Note that s[0:0] is the same, you can omit the first 0.
[0][1][2][3][4][5]
^
s still points to the start of the memory
len(s) = 0
cap(s) = 6
Since the capacity is still the same, we might as well make the length 4 by saying s = s[:4].
* * * *
[0][1][2][3][4][5]
^
s still points to the start of the memory
len(s) = 4
cap(s) = 6
Then we take a sub-slice that does not start at the beginning of the memory by doing s = s[2:].
* *
[0][1][2][3][4][5]
^
s now points to the original address plus two!
len(s) = 2
cap(s) = 4
Leon addressed me to the Go's blog post, where they address exactly my question.
This is the snippet which helped me better understanding this concept:
A slice is a descriptor of an array segment. It consists of a pointer to the array, the length of the segment, and its capacity (the maximum length of the segment).
A slice cannot be grown beyond its capacity. Attempting to do so will cause a runtime panic, just as when indexing outside the bounds of a slice or array. Similarly, slices cannot be re-sliced below zero to access earlier elements in the array.
Slices can be extended if the array has more elements in it, but it can not access elements below 0 of the slice. It's a window to the underlaying array. The blog post explains it in more depth.

Slices in Go: why does it allow appending more than the capacity allows?

The capacity parameter in making a slice in Go does not make much sense to me. For example,
aSlice := make([]int, 2, 2) //a new slice with length and cap both set to 2
aSlice = append(aSlice, 1, 2, 3, 4, 5) //append integers 1 through 5
fmt.Println("aSlice is: ", aSlice) //output [0, 0, 1, 2, 3, 4, 5]
If the slice allows inserting more elements than the capacity allows, why do we need to set it in the make() function?
The builtin append() function uses the specified slice to append elements to if it has a big enough capacity to accomodate the specified elements.
But if the passed slice is not big enough, it allocates a new, big enough slice, copies the elements from the passed slice to the new slice and append the elements to that new slice. And returns this new slice. Quoting from the append() documentation:
The append built-in function appends elements to the end of a slice. If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated. Append returns the updated slice. It is therefore necessary to store the result of append, often in the variable holding the slice itself:
When making a slice with make if the length and capacity are the same, the capacity can be omitted, in which case it is defaulted to the specified length:
// These 2 declarations are equivalent:
s := make([]int, 2, 2)
s := make([]int, 2)
Also note that append() appends elements after the last element of the slice. And the above slices already have len(s) == 2 right after declaration so if you append even just 1 element to it, it will cause a reallocation as seen in this example:
s := make([]int, 2, 2)
fmt.Println(s, len(s), cap(s))
s = append(s, 1)
fmt.Println(s, len(s), cap(s))
Output:
[0 0] 2 2
[0 0 1] 3 4
So in your example what you should do is something like this:
s := make([]int, 0, 10) // Create a slice with length=0 and capacity=10
fmt.Println(s, len(s), cap(s))
s = append(s, 1)
fmt.Println(s, len(s), cap(s))
Output:
[] 0 10
[1] 1 10
I recommend the following blog articles if you want to understand slices in more details:
Go Slices: usage and internals
Arrays, slices (and strings): The mechanics of 'append'
It is mainly an optimization, and it is not unique to go, similar structures in other languages have this as well.
When you append more than the capacity, the runtime needs to allocate more memory for the new elements. This is costly and can also cause memory fragmentation.
By specifying the capacity, the runtime allocates what is needed in advance, and avoids reallocations. However if you do not know the estimated capacity in advance or it changes, you do not have to set it, and the runtime reallocates what is needed and grows the capacity itself.

golang slice, slicing a slice with slice[a:b:c]

I read go slice usage and internals and Slice and Effective go#slice but there is nothing about slicing a slice with 3 number like this : slice[a:b:c]
For example this code :
package main
import "fmt"
func main() {
var s = []string{"a", "b", "c", "d", "e", "f", "g"}
fmt.Println(s[1:2:6], len(s[1:2:6]), cap(s[1:2:6]))
fmt.Println(s[1:2:5], len(s[1:2:5]), cap(s[1:2:5]))
fmt.Println(s[1:2], len(s[1:2]), cap(s[1:2]))
}
go playground result is this :
[b] 1 5
[b] 1 4
[b] 1 6
I can understand that the third one is something about capacity, but what is the exact meaning of this?
Do I miss something in documents?
The syntax has been introduced in Go 1.2, as I mentioned in "Re-slicing slices in Golang".
It is documented in Full slice expressions:
a[low : high : max]
constructs a slice of the same type, and with the same length and elements as the simple slice expression a[low : high].
Additionally, it controls the resulting slice's capacity by setting it to max - low.
Only the first index may be omitted; it defaults to 0.
After slicing the array a:
a := [5]int{1, 2, 3, 4, 5}
t := a[1:3:5]
the slice t has type []int, length 2, capacity 4, and elements
t[0] == 2
t[1] == 3
The design document for that feature had the following justification:
It would occasionally be useful, for example in custom []byte allocation managers, to be able to hand a slice to a caller and know that the caller cannot edit values beyond a given subrange of the true array.
The addition of append to the language made this somewhat more important, because append lets programmers overwrite entries between len and cap without realizing it or even mentioning cap.
2022: svitanok adds for Go 1.19+:
while the capacity of a "derivative" slice doesn't exceed the one specified by the third index during its creation the slice is still "from" the same spot in the memory as its original ("true") slice, so the changes applied to it will affect the original slice.
And if then, for example, you append to this derivative slice the amount of elements that would cause its capacity to be increased, this new slice will occupy a different place in the memory, and so the changes made to it will not affect the slice it originated from.
In a slice expression slice[a:b:c] or aSlice[1:3:5]
a:b or 1:3 -> gives length
a:c or 1:5 -> gives capacity
We can extract both length and capacity from a slice expression with 3 numbers/indices, without looking at the source slice/array.
expression| aSlice[low:high:max] or aSlice[a:b:c] or aSlice[1:3:7]
------------------------------------------------------------------------
Length | len(aSlice[low:high]) or len(aSlice[a:b]) or len(aSlice[1:3])
Capacity | len(aSlice[low:max]) or len(aSlice[a:c]) or len(aSlice[1:7])
------------------------------------------------------------------------
Read more here at Slice Expressions
Playground
Actually Go slice have a pointer and pointing to the array and it holds length and capacity of the array and we can show it like will be
pointer:length:capacity
and append is used for adding same new length.
sl1 := make([]int, 6)
fmt.Println(sl1)
sl2 := append(sl1, 1)
fmt.Println(sl2)
[0 0 0 0 0 0]
[0 0 0 0 0 0 1]

Does Go have no real way to shrink a slice? Is that an issue?

I've been trying out Go for some time and this question keeps bugging me. Say I build up a somewhat large dataset in a slice (say, 10 million int64s).
package main
import (
"math"
"fmt"
)
func main() {
var a []int64
var i int64;
upto := int64(math.Pow10(7))
for i = 0; i < upto; i++ {
a = append(a, i)
}
fmt.Println(cap(a))
}
But then I decide I don't want most of them so I want to end up with a slice of just 10 of those. I've tried both slicing and delete techniques on Go's wiki but none of them seem to reduce the slice's capacity.
So that's my question: does Go has no real way of shrinking the capacity of a slice that would be similar to realloc()-ing with a smaller size argument than in your previous call on the same pointer in C? Is that an issue and how should one deal with it?
To perform an, in effect, a realloc of a slice:
a = append([]T(nil), a[:newSize]...) // Thanks to #Dijkstra for pointing out the missing ellipsis.
If it does a copy of newSize elements to a new memory place or if it does an actual in place resize as in realloc(3) is at complete discretion of the compiler. You might want to investigate the current state and perhaps raise an issue if there's a room for improvement in this.
However, this is likely a micro-optimization. The first source of performance enhancements lies almost always in selecting a better algorithm and/or a better data structure. Using a hugely sized vector to finally keep a few items only is probably not the best option wrt to memory consumption.
EDIT: The above is only partially correct. The compiler cannot, in the general case, derive if there are other pointers to the slice's backing array. Thus the realloc is not applicable. The above snippet is actually guaranteed to peform a copy of 'newSize' elements. Sorry for any confusion possibly created.
Go does not have a way of shrinking slices. This isn't a problem in most cases, but if you profile your memory use and find you're using too much, you can do something about it:
Firstly, you can just create a slice of the size you need and copy your data into it. The garbage collector will then free the large slice. Copy built-in
Secondly, you could re-use the big slice each time you wish to generate it, so you never allocate it more than once.
On a final note, you can use 1e7 instead of math.Pow10(7).
Let's see this example:
func main() {
s := []string{"A", "B", "C", "D", "E", "F", "G", "H"}
fmt.Println(s, len(s), cap(s)) // slice, length, capacity
t := s[2:4]
fmt.Println(t, len(t), cap(t))
u := make([]string, len(t))
copy(u, t)
fmt.Println(u, len(u), cap(u))
}
It produces the following output:
[A B C D E F G H] 8 8
[C D] 2 6
[C D] 2 2
s is a slice that holds 8 pieces of strings. t is a slice that keeps the part [C D]. The length of t is 2, but since it uses the same hidden array of s, its capacity is 6 (from "C" to "H"). The question is: how to have a slice of [C D] that is independent from the hidden array of s? Simply create a new slice of strings with length 2 (slice u) and copy the content of t to u. u's underlying hidden array is different from the hidden array of s.
The initial problem was this: you have a big slice and you create a new smaller slice on it. Since the smaller slice uses the same hidden array, the garbage collector won't delete the hidden array.
See the bottom of this post for more info: http://blog.golang.org/go-slices-usage-and-internals .
Additionally you can re-use most of the allocated memory during work of yours app, take a look at: bufs package
PS if you re-alocate new memory for smaller slice, old memory may not be freed in same time, it will be freed when garbage collector decides to.
You can do that by re-assigning the slice's value to a portion of itself
a := []int{1,2,3}
fmt.Println(len(a), a) // 3 [1 2 3]
a = a[:len(a)-1]
fmt.Println(len(a), a) //2 [1 2]
There is a new feature called 3-index slice in Go 1.2, which means to get part of a slice in this way:
slice[a:b:c]
In which the len for the returned slice whould be b-a, and the cav for the new slice would be c-a.
Tips: no copy is down in the whole process, it only returns a new slice which points to &slice[a] and has the len as b-a and cav as c-a.
And that's the only thing you have to do:
slice= slice[0:len(slice):len(slice)];
Then the cav of the slice would be changed to len(slice) - 0, which is the same as the len of it, and no copy is done.

Resources