Golang: Make function and third param - go

What is the difference between:
x := make([]int, 5, 10)
x := make([]int, 5)
x := [5]int{}
I know that make allocates an array and returns a slice that refers to that array. I don't understand where it can be used?
I can't find a good example that will clarify the situation.

x := make([]int, 5) Makes slice of int with length 5 and capacity 5 (same as length).
x := make([]int, 5, 10) Makes slice of int with length 5 and capacity 10.
x := [5]int{} Makes array of int with length 5.
Slices
If you need to append more items than capacity of slice using append function, go runtime will allocate new underlying array and copy existing one to it. So if you know about estimated length of your slice, better to use explicit capacity declaration. It will consume more memory for underlying array at the beginning, but safe cpu time for many allocations and array copying.
You can explore how len and cap changes while append, using that simple test on Go playground
Every time when cap value changed, new array allocated
Arrays
Array size is fixed, so if you need to grow array you have to create new one with new length and copy your old array into it by your own.
There are some great articles about slices and arrays in go:
http://blog.golang.org/go-slices-usage-and-internals
http://blog.golang.org/slices

The second line will allocate 10 int's worth memory at the very beginning, but returning you a slice of 5 int's. The second line does not stand less memory, it saves you another memory allocation if you need to expand the slice to anything not more than 10 * load_factor.

Related

What is the point in setting a slice's capacity?

In Golang, we can use the builtin make() function to create a slice with a given initial length and capacity.
Consider the following lines, the slice's length is set to 1, and its capacity 3:
func main() {
var slice = make([]int, 1, 3)
slice[0] = 1
slice = append(slice, 6, 0, 2, 4, 3, 1)
fmt.Println(slice)
}
I was surprised to see that this program prints:
[1 6 0 2 4 3 1]
This got me wondering- what is the point of initially defining a slice's capacity if append() can simply blow past it? Are there performance gains for setting a sufficiently large capacity?
A slice is really just a fancy way to manage an underlying array. It automatically tracks size, and re-allocates new space as needed.
As you append to a slice, the runtime doubles its capacity every time it exceeds its current capacity. It has to copy all of the elements to do that. If you know how big it will be before you start, you can avoid a few copy operations and memory allocations by grabbing it all up front.
When you make a slice providing capacity, you set the initial capacity, not any kind of limit.
See this blog post on slices for some interesting internal details of slices.
A slice is a wonderful abstraction of a simple array. You get all sorts of nice features, but deep down at its core, lies an array. (I explain the following in reverse order for a reason). Therefore, if/when you specify a capacity of 3, deep down, an array of length 3 is allocated in memory, which you can append up to without having it need to reallocate memory. This attribute is optional in the make command, but note that a slice will always have a capacity whether or not you choose to specify one. If you specify a length (which always exists as well), the slice be indexable up to that length. The rest of the capacity is hidden away behind the scenes so it does not have to allocate an entirely new array when append is used.
Here is an example to better explain the mechanics.
s := make([]int, 1, 3)
The underlying array will be allocated with 3 of the zero value of int (which is 0):
[0,0,0]
However, the length is set to 1, so the slice itself will only print [0], and if you try to index the second or third value, it will panic, as the slice's mechanics do not allow it. If you s = append(s, 1) to it, you will find that it has actually been created to contain zero values up to the length, and you will end up with [0,1]. At this point, you can append once more before the entire underlying array is filled, and another append will force it to allocate a new one and copy all the values over with a doubled capacity. This is actually a rather expensive operation.
Therefore the short answer to your question is that preallocating the capacity can be used to vastly improve the efficiency of your code. Especially so if the slice is either going to end up very large, or contains complex structs (or both), as the zero value of a struct is effectively the zero values of every single one of its fields. This is not because it would avoid allocating those values, as it has to anyway, but because append would have to reallocate new arrays full of these zero values each time it would need to resize the underlying array.
Short playground example: https://play.golang.org/p/LGAYVlw-jr
As others have already said, using the cap parameter can avoid unnecessary allocations. To give a sense of the performance difference, imagine you have a []float64 of random values and want a new slice that filters out values that are not above, say, 0.5.
Naive approach - no len or cap param
func filter(input []float64) []float64 {
ret := make([]float64, 0)
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Better approach - using cap param
func filterCap(input []float64) []float64 {
ret := make([]float64, 0, len(input))
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Benchmarks (n=10)
filter 131 ns/op 56 B/op 3 allocs/op
filterCap 56 ns/op 80 B/op 1 allocs/op
Using cap made the program 2x+ faster and reduced the number of allocations from 3 to 1. Now what happens at scale?
Benchmarks (n=1,000,000)
filter 9630341 ns/op 23004421 B/op 37 allocs/op
filterCap 6906778 ns/op 8003584 B/op 1 allocs/op
The speed difference is still significant (~1.4x) thanks to 36 fewer calls to runtime.makeslice. However, the bigger difference is the memory allocation (~4x less).
Even better - calibrating the cap
You may have noticed in the first benchmark that cap makes the overall memory allocation worse (80B vs 56B). This is because you allocate 10 slots but only need, on average, 5 of them. This is why you don't want to set cap unnecessarily high. Given what you know about your program, you may be able to calibrate the capacity. In this case, we can estimate that our filtered slice will need 50% as many slots as the original slice.
func filterCalibratedCap(input []float64) []float64 {
ret := make([]float64, 0, len(input)/2)
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Unsurprisingly, this calibrated cap allocates 50% as much memory as its predecessor, so that's ~8x improvement on the naive implementation at 1m elements.
Another option - using direct access instead of append
If you are looking to shave even more time off a program like this, initialize with the len parameter (and ignore the cap parameter), access the new slice directly instead of using append, then throw away all the slots you don't need.
func filterLen(input []float64) []float64 {
ret := make([]float64, len(input))
var counter int
for _, el := range input {
if el > .5 {
ret[counter] = el
counter++
}
}
return ret[:counter]
}
This is ~10% faster than filterCap at scale. However, in addition to being more complicated, this pattern does not provide the same safety as cap if you try and calibrate the memory requirement.
With cap calibration, if you underestimate the total capacity required, then the program will automatically allocate more when it needs it.
With this approach, if you underestimate the total len required, the program will fail. In this example, if you initialize as ret := make([]float64, len(input)/2), and it turns out that len(output) > len(input)/2, then at some point the program will try to access a non-existent slot and panic.
Each time you add an item to a slice that has len(mySlice) == cap(mySlice), the underlying data structure is replaced with a larger structure.
fmt.Printf("Original Capacity: %v", cap(mySlice)) // Output: 8
mySlice = append(mySlice, myNewItem)
fmt.Printf("New Capacity: %v", cap(mySlice)) // Output: 16
Here, mySlice is replaced (through the assignment operator) with a new slice containing all the elements of the original mySlice, plus myNewItem, plus some room (capacity) to grow without triggering this resize.
As you can imagine, this resizing operation is computationally non-trivial.
Quite often, all the resize operations can be avoided if you know how many items you will need to store in mySlice. If you have this foreknowledge, you can set the capacity of the original slice upfront and avoid all the resize operations.
(In practice, it's quite often possible to know how many items will be added to a collection; especially when transforming data from one format to another.)

Best practices constructing an empty array

I'm wondering about best practices when initializing empty arrays.
i.e. Is there any difference here between arr1, arr2, and arr3?
myArr1 := []int{}
myArr2 := make([]int,0)
var myArr3 []int
I know that they make empty []int but I wonder, is one syntax preferable to the others? Personally I find the first to be most readable but that's beside the point here. One key point of contention may be the array capacity, presumably the default capacity is the same between the three as it is unspecified. Is declaring arrays of unspecified capacity "bad"? I can assume it comes with some performance cost but how "bad" is it really?
/tldr:
Is there any difference between the 3 ways to make an empty
array?
What is the default capacity of an array when unspecified?
What is the performance cost of using arrays with unspecified capacity?
First, it's a slice not an array. Arrays and slices in Go are very different, arrays have a fixed size that is part of the type. I had trouble with this at first too :)
Not really. Any if the three is correct, and any difference should be too small to worry about. In my own code I generally use whatever is easiest in a particular case.
0
Nothing, until you need to add an item, then whatever it costs to allocate the storage needed.
What is the performance cost of using arrays with unspecified capacity?
There is certainly a cost when you start populating the slice. If you know how big the slice should grow, you can allocate capacity of the underlying array from the very begging as opposed to reallocating every time the underlying array fills up.
Here is a simple example with timing:
package main
import "fmt"
func main() {
limit := 500 * 1000 * 1000
mySlice := make([]int, 0, limit) //vs mySlice := make([]int, 0)
for i := 0; i < limit; i++ {
mySlice = append(mySlice, i)
}
fmt.Println(len(mySlice))
}
On my machine:
time go run my_file.go
With preallocation:
real 0m2.129s
user 0m2.073s
sys 0m1.357s
Without preallocation
real 0m7.673s
user 0m9.095s
sys 0m3.462s
Is there any difference between the 3 ways to make an empty array?
if empty array means len(array)==0, the answer is no, but actually only myArr3==nil is true.
What is the default capacity of an array when unspecified?
the default capacity will be same with the len you specify.
What is the performance cost of using arrays with unspecified capacity?
none

Slices in Go: why does it allow appending more than the capacity allows?

The capacity parameter in making a slice in Go does not make much sense to me. For example,
aSlice := make([]int, 2, 2) //a new slice with length and cap both set to 2
aSlice = append(aSlice, 1, 2, 3, 4, 5) //append integers 1 through 5
fmt.Println("aSlice is: ", aSlice) //output [0, 0, 1, 2, 3, 4, 5]
If the slice allows inserting more elements than the capacity allows, why do we need to set it in the make() function?
The builtin append() function uses the specified slice to append elements to if it has a big enough capacity to accomodate the specified elements.
But if the passed slice is not big enough, it allocates a new, big enough slice, copies the elements from the passed slice to the new slice and append the elements to that new slice. And returns this new slice. Quoting from the append() documentation:
The append built-in function appends elements to the end of a slice. If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated. Append returns the updated slice. It is therefore necessary to store the result of append, often in the variable holding the slice itself:
When making a slice with make if the length and capacity are the same, the capacity can be omitted, in which case it is defaulted to the specified length:
// These 2 declarations are equivalent:
s := make([]int, 2, 2)
s := make([]int, 2)
Also note that append() appends elements after the last element of the slice. And the above slices already have len(s) == 2 right after declaration so if you append even just 1 element to it, it will cause a reallocation as seen in this example:
s := make([]int, 2, 2)
fmt.Println(s, len(s), cap(s))
s = append(s, 1)
fmt.Println(s, len(s), cap(s))
Output:
[0 0] 2 2
[0 0 1] 3 4
So in your example what you should do is something like this:
s := make([]int, 0, 10) // Create a slice with length=0 and capacity=10
fmt.Println(s, len(s), cap(s))
s = append(s, 1)
fmt.Println(s, len(s), cap(s))
Output:
[] 0 10
[1] 1 10
I recommend the following blog articles if you want to understand slices in more details:
Go Slices: usage and internals
Arrays, slices (and strings): The mechanics of 'append'
It is mainly an optimization, and it is not unique to go, similar structures in other languages have this as well.
When you append more than the capacity, the runtime needs to allocate more memory for the new elements. This is costly and can also cause memory fragmentation.
By specifying the capacity, the runtime allocates what is needed in advance, and avoids reallocations. However if you do not know the estimated capacity in advance or it changes, you do not have to set it, and the runtime reallocates what is needed and grows the capacity itself.

Does Go have no real way to shrink a slice? Is that an issue?

I've been trying out Go for some time and this question keeps bugging me. Say I build up a somewhat large dataset in a slice (say, 10 million int64s).
package main
import (
"math"
"fmt"
)
func main() {
var a []int64
var i int64;
upto := int64(math.Pow10(7))
for i = 0; i < upto; i++ {
a = append(a, i)
}
fmt.Println(cap(a))
}
But then I decide I don't want most of them so I want to end up with a slice of just 10 of those. I've tried both slicing and delete techniques on Go's wiki but none of them seem to reduce the slice's capacity.
So that's my question: does Go has no real way of shrinking the capacity of a slice that would be similar to realloc()-ing with a smaller size argument than in your previous call on the same pointer in C? Is that an issue and how should one deal with it?
To perform an, in effect, a realloc of a slice:
a = append([]T(nil), a[:newSize]...) // Thanks to #Dijkstra for pointing out the missing ellipsis.
If it does a copy of newSize elements to a new memory place or if it does an actual in place resize as in realloc(3) is at complete discretion of the compiler. You might want to investigate the current state and perhaps raise an issue if there's a room for improvement in this.
However, this is likely a micro-optimization. The first source of performance enhancements lies almost always in selecting a better algorithm and/or a better data structure. Using a hugely sized vector to finally keep a few items only is probably not the best option wrt to memory consumption.
EDIT: The above is only partially correct. The compiler cannot, in the general case, derive if there are other pointers to the slice's backing array. Thus the realloc is not applicable. The above snippet is actually guaranteed to peform a copy of 'newSize' elements. Sorry for any confusion possibly created.
Go does not have a way of shrinking slices. This isn't a problem in most cases, but if you profile your memory use and find you're using too much, you can do something about it:
Firstly, you can just create a slice of the size you need and copy your data into it. The garbage collector will then free the large slice. Copy built-in
Secondly, you could re-use the big slice each time you wish to generate it, so you never allocate it more than once.
On a final note, you can use 1e7 instead of math.Pow10(7).
Let's see this example:
func main() {
s := []string{"A", "B", "C", "D", "E", "F", "G", "H"}
fmt.Println(s, len(s), cap(s)) // slice, length, capacity
t := s[2:4]
fmt.Println(t, len(t), cap(t))
u := make([]string, len(t))
copy(u, t)
fmt.Println(u, len(u), cap(u))
}
It produces the following output:
[A B C D E F G H] 8 8
[C D] 2 6
[C D] 2 2
s is a slice that holds 8 pieces of strings. t is a slice that keeps the part [C D]. The length of t is 2, but since it uses the same hidden array of s, its capacity is 6 (from "C" to "H"). The question is: how to have a slice of [C D] that is independent from the hidden array of s? Simply create a new slice of strings with length 2 (slice u) and copy the content of t to u. u's underlying hidden array is different from the hidden array of s.
The initial problem was this: you have a big slice and you create a new smaller slice on it. Since the smaller slice uses the same hidden array, the garbage collector won't delete the hidden array.
See the bottom of this post for more info: http://blog.golang.org/go-slices-usage-and-internals .
Additionally you can re-use most of the allocated memory during work of yours app, take a look at: bufs package
PS if you re-alocate new memory for smaller slice, old memory may not be freed in same time, it will be freed when garbage collector decides to.
You can do that by re-assigning the slice's value to a portion of itself
a := []int{1,2,3}
fmt.Println(len(a), a) // 3 [1 2 3]
a = a[:len(a)-1]
fmt.Println(len(a), a) //2 [1 2]
There is a new feature called 3-index slice in Go 1.2, which means to get part of a slice in this way:
slice[a:b:c]
In which the len for the returned slice whould be b-a, and the cav for the new slice would be c-a.
Tips: no copy is down in the whole process, it only returns a new slice which points to &slice[a] and has the len as b-a and cav as c-a.
And that's the only thing you have to do:
slice= slice[0:len(slice):len(slice)];
Then the cav of the slice would be changed to len(slice) - 0, which is the same as the len of it, and no copy is done.

Why use make() to create a slice in Go?

What is the difference between var a [4]int and b := make([]int, 4)? The b can be extended, but not a, right? But if I know that I need really i.e. 4 elements, then is an array faster then a slice?
Is there any performance difference between var d []int and e := make([]int)? Would f := make([]int, 5) provide more performance than without the length for the first i.e. 5 elements?
Would this c := make([]int, 5, 10) not allocate more memory than I can access?
a is an array, and b is a slice. What makes slices different from arrays is that a slice is a pointer to an array; slices are reference types, which means that if you assign one slice
to another, both refer to the same underlying array. For instance, if a function takes a
slice argument, changes it makes to the elements of the slice will be visible to the caller,
analogous to passing a pointer to the underlying array(Above from Learning Go). You can easily use append and copy with slice. Array should be a little faster than slice, but it doesn't make much difference. Unless you know the size exactly, it would be better to use slice which make things easy.
make([]type,length, capacity), you can estimate the size and possible capacity to improve the performance.
More details, you can refer:Go Slices: usage and internals

Resources