Efficient allocation of slices (cap vs length) - go

Assuming I am creating a slice, which I know in advance that I want to populate via a for loop with 1e5 elements via successive calls to append:
// Append 1e5 strings to the slice
for i := 0; i<= 1e5; i++ {
value := fmt.Sprintf("Entry: %d", i)
myslice = append(myslice, value)
}
which is the more efficient way of initialising the slice and why:
a. declaring a nil slice of strings?
var myslice []string
b. setting its length in advance to 1e5?
myslice = make([]string, 1e5)
c. setting both its length and capacity to 1e5?
myslice = make([]string, 1e5, 1e5)

Your b and c solutions are identical: creating a slice with make() where you don't specify the capacity, the "missing" capacity defaults to the given length.
Also note that if you create the slice with a length in advance, you can't use append() to populate the slice, because it adds new elements to the slice, and it doesn't "reuse" the allocated elements. So in that case you have to assign values to the elements using an index expression, e.g. myslice[i] = value.
If you start with a slice with 0 capacity, a new backing array have to be allocated and "old" content have to be copied over whenever you append an element that does not fit into the capacity, so that solution must be slower inherently.
I would define and consider the following different solutions (I use an []int slice to avoid fmt.Sprintf() to intervene / interfere with our benchmarks):
var s []int
func BenchmarkA(b *testing.B) {
for i := 0; i < b.N; i++ {
s = nil
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkB(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkBLocal(b *testing.B) {
for i := 0; i < b.N; i++ {
s := make([]int, 0, 1e5)
for j := 0; j < 1e5; j++ {
s = append(s, j)
}
}
}
func BenchmarkD(b *testing.B) {
for i := 0; i < b.N; i++ {
s = make([]int, 1e5)
for j := range s {
s[j] = j
}
}
}
Note: I use package level variables in benchmarks (except BLocal), because some optimization may (and actually do) happen when using a local slice variable).
And the benchmark results:
BenchmarkA-4 1000 1081599 ns/op 4654332 B/op 30 allocs/op
BenchmarkB-4 3000 371096 ns/op 802816 B/op 1 allocs/op
BenchmarkBLocal-4 10000 172427 ns/op 802816 B/op 1 allocs/op
BenchmarkD-4 10000 167305 ns/op 802816 B/op 1 allocs/op
A: As you can see, starting with a nil slice is the slowest, uses the most memory and allocations.
B: Pre-allocating the slice with capacity (but still 0 length) and using append: it requires only a single allocation and is much faster, almost thrice as fast.
BLocal: Do note that when using a local slice instead of a package variable, (compiler) optimizations happen and it gets a lot faster: twice as fast, almost as fast as D.
D: Not using append() but assigning elements to a preallocated slice wins in every aspect, even when using a non-local variable.

For this use case, since you already know the number of string elements that you want to assign to the slice,
I would prefer approach b or c.
Since you will prevent resizing of the slice using these two approaches.
If you choose to use approach a, the slice will double its size everytime a new element is added after len equals capacity.
https://play.golang.org/p/kSuX7cE176j

Related

for loop speed comparison

I was wondering how fast was the len operator in Go and I wrote a simple benchmark. My expectations were that by avoiding calling len during each loop iteration, the code would run faster, but it is in fact the opposite.
Here's the benchmark:
func sumArrayNumber(input []int) int {
var res int
for i, length := 0, len(input); i < length; i += 1 {
res += input[i]
}
return res
}
func sumArrayNumber2(input []int) int {
var res int
for i := 0; i < len(input); i += 1 {
res += input[i]
}
return res
}
var result int
var input = []int{3, 6, 22, 68, 11, -7, 22, 5, 0, 0, 1}
func BenchmarkSumArrayNumber(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = sumArrayNumber(input)
}
result = r
}
func BenchmarkSumArrayNumber2(b *testing.B) {
var r int
for n := 0; n < b.N; n++ {
r = sumArrayNumber2(input)
}
result = r
}
And here are the results:
goos: windows
goarch: amd64
BenchmarkSumArrayNumber-8 300000000 4.75 ns/op
BenchmarkSumArrayNumber2-8 300000000 4.67 ns/op
PASS
ok command-line-arguments 4.000s
I confirmed the resistent are consistents by doing the following:
doubling the input array size roughly double the execution time per op. The speed difference scales with the length of the input array.
exchanging the test order does not impact the results.
Why is the code checking len() at every loop iteration is faster?
One may argue that a difference of 0.08ns is not statistically relevant to say that one for-loop is faster than the other. You problably need to run the same test many times (more than 20 times at least), at that point you should be able to derive mean value and standard variation.
Moreover, there are many factors that can speedup the len() operator. Like CPU cache and compiler optimizations. I think that the most relevant factor in your specific example is that len() operator for slice and array just reads the len field in slice's data structure. Thus, it is O(1).

Golang slice append and reallocation

I've been learning go recently and had a question about the behavior of slices when reallocation occurs. Assume I have a slice of pointers to a struct, such as:
var a []*A
If I were to pass this slice to another function, and my understanding is that internally this passes a slice header by value, that runs on a separate goroutine and just reads from the slice, while the function that launched the goroutine continues to append to the slice, is that a problem? For example:
package main
type A struct {
foo int
}
func main() {
a := make([]*A, 0, 100)
ch := make(chan int)
for i := 0; i < 100; i++ {
a = append(a, &A{i})
}
go read_slice(a, ch)
for i := 0; i < 100; i++ {
a = append(a, &A{i+100})
}
<-ch
}
func read_slice(a []*A, ch chan int) {
for i := range a {
fmt.Printf("%d ", a[i].foo)
}
ch <- 1
}
So from my understanding, as the read_slice() function is running on its own goroutine with a copy of the slice header, it has an underlying pointer to the current backing array and the size at the time it was called through which I can access the foo's.
However, when the other goroutine is appending to the slice it will trigger a reallocation when the capacity is exceeded. Does the go runtime not deallocate the memory to the old backing array being used in read_slice() since there is a reference to it in that function?
I tried running this with "go run -race slice.go" but that didn't report anything, but I feel like I might be doing something wrong here? Any pointers would be appreciated.
Thanks!
The GC does not collect the backing array until there are no references to the backing array. There are no races in the program.
Consider the scenario with no goroutines:
a := make([]*A, 0, 100)
for i := 0; i < 100; i++ {
a = append(a, &A{i})
}
b := a
for i := 0; i < 100; i++ {
b = append(b, &A{i+100})
}
The slice a will continue to reference the backing array with the first 100 pointers when append to b allocates a new backing array. The slice a is not left with a dangling reference to a backing array.
Now add the goroutine to the scenario:
a := make([]*A, 0, 100)
for i := 0; i < 100; i++ {
a = append(a, &A{i})
}
b := a
go read_slice(a, ch)
for i := 0; i < 100; i++ {
b = append(b, &A{i+100})
}
The goroutine can happily use slice a. There's no dangling reference.
Now consider the program in the question. It's functionaly identical to the last snippet here.

performance of for range in go

When ranging over an array, two values are returned for each iteration. The first is the index, and the second is a copy of the element at that index.
Here's my code:
var myArray = [5]int {1,2,3,4,5}
sum := 0
// first with copy
for _, value := range myArray {
sum += value
}
// second without copy
for i := range myArray {
sum += myArray[i]
}
Which one should i use for better performance?
Is there any difference for built-in types in these two pieces of code?
We can test this using Go's benchmarking tool (read more at https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go).
sum_test.go
package sum
import "testing"
func BenchmarkSumIterator(b *testing.B) {
var ints = [5]int{1, 2, 3, 4, 5}
sum := 0
for i := 0; i < b.N; i++ {
for j := range ints {
sum += ints[j]
}
}
}
func BenchmarkSumRange(b *testing.B) {
var ints = [5]int{1, 2, 3, 4, 5}
sum := 0
for i := 0; i < b.N; i++ {
for _, value := range ints {
sum += value
}
}
}
Run it with:
$ go test -bench=. sum_test.go
goos: linux
goarch: amd64
BenchmarkSumIterator-4 412796047 2.97 ns/op
BenchmarkSumRange-4 413581974 2.89 ns/op
PASS
ok command-line-arguments 3.010s
Range appears be to slightly more efficient. Running this benchmark a few more times also confirms this. It's worth noting that this may only be true for this specific case where you have a small fixed size array. You should try to make decisions like these based on what you'd encounter in production and also try to reconcile that with code readability.
the second one is faster but the difference is too low which you can ignore
the main difference is when you have a big size loop. in that case first loop takes more memory than the second one

64-bit Unsigned Integer concat benchmarking

I have a tweak question. I had some repetitive pieces of code in my web-app requests where simple things like fmt.Sprintf("%d%d", 01293123, 234993923) happens.
Now I did some benchmarks and tested a bytes writer, which was quite slow (~400 ns/op). The sprintf itself was about 240 ns/op. Then I did strings joining:
var concatBuffStr []byte
func BenchmarkStringsJoin(b *testing.B) {
for n := 0; n < b.N; n++ {
concatBuffStr = []byte(strings.Join([]string{
strconv.Itoa(2),
strconv.Itoa(3),
}, " "))
}
}
152 ns/op
A big improvement already. But then I tried clunking 2 ints converted to bytes together
var concatBuffStr []byte
func BenchmarkConcatIntsAsBytes(b *testing.B) {
for n := 0; n < b.N; n++ {
aBuf := make([]byte, 8)
bBuf := make([]byte, 8)
binary.BigEndian.PutUint64(aBuf, uint64(2))
binary.BigEndian.PutUint64(bBuf, uint64(3))
concatBuffStr = append(aBuf, bBuf...)
}
}
57.8 ns/op
Amazing! But I could just avoid glueing them together and already reserve the full 16 bytes for 2 maxed out 64 bit uints spaces:
var concatBuffStr []byte
func BenchmarkCopyIntsAsBytes(b *testing.B) {
for n := 0; n < b.N; n++ {
concatBuffStr = make([]byte, 16)
bBuf := make([]byte, 8)
binary.BigEndian.PutUint64(concatBuffStr, uint64(123123))
binary.BigEndian.PutUint64(bBuf, uint64(3453455))
copy(concatBuffStr[8:], bBuf)
}
}
30.4 ns/op
By now we're 8 times faster than with the fmt.Sprintf() method.
Was wondering if there is an even faster way than this. I want to avoid unsafe tho.
I was also thinking of checking the max value of both 2 ints and if they are below the MAX of an uint32 or uint16 I could customize the logic. Benchmarking the uint16 downgrade by hand is around 23 ns/op, but that's not realistic as in the web-request itself it would still need to check the size of the ints and do extra logic, which will likely result in more overhead than that 7 ns/op gain.
Update
I managed to simplify it a little bit more:
var concatBuffStr []byte
func BenchmarkCopyIntsAsBytesShort(b *testing.B) {
for n := 0; n < b.N; n++ {
concatBuffStr = make([]byte, 16)
binary.BigEndian.PutUint64(concatBuffStr, uint64(123123))
binary.BigEndian.PutUint64(concatBuffStr[8:], uint64(3453455))
}
}
28.6 ns/op

Initialize golang slice with int numbers from 0 to N

I am almost certain that I read about a simple "tricky" way to initialize slice of ints with the numbers from 0 to N, but I cannot find it anymore.
What is the simplest way to do this?
You just use make passing N for the length then use a simple for loop to set the values...
mySlice := make([]int, N)
for i := 0; i < N; i++ {
mySlice[i] = i
}
Here's a full example on play; https://play.golang.org/p/yvyzuWxN1M

Resources