Appending to slice bad performance.. why? - performance

I'm currently creating a game using GoLang. I'm measuring the FPS. I'm noticing about a 7 fps loss using a for loop to append to a slice like so:
vertexInfo := Opengl.OpenGLVertexInfo{}
for i := 0; i < 4; i = i + 1 {
vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)
}
I'm doing this for every sprite, every draw. The question is why do I get such a huge performance hit with just looping for times and appending the same thing to these slices? Is there a more efficient way to do this? It is not like I'm adding exuberant amount of data. Each slice contains about 16 elements as shown above (4 x 4).
When I simply put all 16 elements in one []float32{1..16} then fps is improved by about 4.
Update: I benchmarked each append and it seems that each one takes 1 fps to perform.. That seems like a lot considering this data is pretty static.. I only need 4 iterations...
Update: Added github repo https://github.com/Triangle345/GT

The builtin append() needs to create a new backing array if the capacity of the destination slice is less than what the length of the slice would be after the append. This also requires to copy the current elements from destination to the newly allocated array, so there are much overhead.
Slices you append to are most likely empty slices since you used a slice literal to create your Opengl.OpenGLVertexInfo value. Even though append() thinks for the future and allocates a bigger array than what is needed to append the specified elements, chances are that in your case multiple reallocations will be needed to complete the 4 iterations.
You may avoid reallocations if you create and initialize vertexInfo like this:
vertexInfo := Opengl.OpenGLVertexInfo{
Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
Rotations: []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
Scales: []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
Colors: []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}
Also note that this struct literal will take care of not having to reallocate arrays behind the slices. But if in other places of your code (which we don't see) you append further elements to these slices, they may cause reallocations. If this is the case, you should create slices with bigger capacity covering "future" allocations (e.g. make([]float64, 16, 32)).

An empty slice is empty. To append, it must allocate memory. And then you do more appends, which have to allocate even more memory.
To speed it up use a fixed size array or use make to create a slice with the correct length, or initialize the slice with the items when you declare it.

Related

How does img.At(x, y) correlate with a uint32[][] structure [duplicate]

I am learning Go by going through A Tour of Go. One of the exercises there asks me to create a 2D slice of dy rows and dx columns containing uint8. My current approach, which works, is this:
a:= make([][]uint8, dy) // initialize a slice of dy slices
for i:=0;i<dy;i++ {
a[i] = make([]uint8, dx) // initialize a slice of dx unit8 in each of dy slices
}
I think that iterating through each slice to initialize it is too verbose. And if the slice had more dimensions, the code would become unwieldy. Is there a concise way to initialize 2D (or n-dimensional) slices in Go?
There isn't a more concise way, what you did is the "right" way; because slices are always one-dimensional but may be composed to construct higher-dimensional objects. See this question for more details: Go: How is two dimensional array's memory representation.
One thing you can simplify on it is to use the for range construct:
a := make([][]uint8, dy)
for i := range a {
a[i] = make([]uint8, dx)
}
Also note that if you initialize your slice with a composite literal, you get this for "free", for example:
a := [][]uint8{
{0, 1, 2, 3},
{4, 5, 6, 7},
}
fmt.Println(a) // Output is [[0 1 2 3] [4 5 6 7]]
Yes, this has its limits as seemingly you have to enumerate all the elements; but there are some tricks, namely you don't have to enumerate all values, only the ones that are not the zero values of the element type of the slice. For more details about this, see Keyed items in golang array initialization.
For example if you want a slice where the first 10 elements are zeros, and then follows 1 and 2, it can be created like this:
b := []uint{10: 1, 2}
fmt.Println(b) // Prints [0 0 0 0 0 0 0 0 0 0 1 2]
Also note that if you'd use arrays instead of slices, it can be created very easily:
c := [5][5]uint8{}
fmt.Println(c)
Output is:
[[0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0]]
In case of arrays you don't have to iterate over the "outer" array and initialize "inner" arrays, as arrays are not descriptors but values. See blog post Arrays, slices (and strings): The mechanics of 'append' for more details.
Try the examples on the Go Playground.
There are two ways to use slices to create a matrix. Let's take a look at the differences between them.
First method:
matrix := make([][]int, n)
for i := 0; i < n; i++ {
matrix[i] = make([]int, m)
}
Second method:
matrix := make([][]int, n)
rows := make([]int, n*m)
for i := 0; i < n; i++ {
matrix[i] = rows[i*m : (i+1)*m]
}
In regards to the first method, making successive make calls doesn't ensure that you will end up with a contiguous matrix, so you may have the matrix divided in memory. Let's think of an example with two Go routines that could cause this:
The routine #0 runs make([][]int, n) to get allocated memory for matrix, getting a piece of memory from 0x000 to 0x07F.
Then, it starts the loop and does the first row make([]int, m), getting from 0x080 to 0x0FF.
In the second iteration it gets preempted by the scheduler.
The scheduler gives the processor to routine #1 and it starts running. This one also uses make (for its own purposes) and gets from 0x100 to 0x17F (right next to the first row of routine #0).
After a while, it gets preempted and routine #0 starts running again.
It does the make([]int, m) corresponding to the second loop iteration and gets from 0x180 to 0x1FF for the second row. At this point, we already got two divided rows.
With the second method, the routine does make([]int, n*m) to get all the matrix allocated in a single slice, ensuring contiguity. After that, a loop is needed to update the matrix pointers to the subslices corresponding to each row.
You can play with the code shown above in the Go Playground to see the difference in the memory assigned by using both methods. Note that I used runtime.Gosched() only with the purpose of yielding the processor and forcing the scheduler to switch to another routine.
Which one to use? Imagine the worst case with the first method, i.e. each row is not next in memory to another row. Then, if your program iterates through the matrix elements (to read or write them), there will probably be more cache misses (hence higher latency) compared to the second method because of worse data locality. On the other hand, with the second method it may not be possible to get a single piece of memory allocated for the matrix, because of memory fragmentation (chunks spread all over the memory), even though theoretically there may be enough free memory for it.
Therefore, unless there's a lot of memory fragmentation and the matrix to be allocated is huge enough, you would always want to use the second method to get advantage of data locality.
With Go 1.18 you get generics.
Here is a function that uses generics to allow to create a 2D slice for any cell type.
func Make2D[T any](n, m int) [][]T {
matrix := make([][]T, n)
rows := make([]T, n*m)
for i, startRow := 0, 0; i < n; i, startRow = i+1, startRow+m {
endRow := startRow + m
matrix[i] = rows[startRow:endRow:endRow]
}
return matrix
}
With that function in your toolbox, your code becomes:
a := Make2D[uint8](dy, dx)
You can play with the code on the Go Playground.
Here a consive way to do it:
value := [][]string{}{[]string{}{"A1","A2"}, []string{}{"B1", "B2"}}
PS.: you can change "string" to the type of element you're using in your slice.

WebGL - When should I call bindBuffer and vertexAttribPointer?

This question is specific to WebGL and assumes VAOs are not available.
I'm trying to make some little improvements to a 3D engine by limiting the number of low-level state changes. But it turns out I'm a bit confused about the proper way to use bindBuffer and vertexAttribPointer.
Let's say I want to draw 2 objects:
The first object make use of two buffers A and C with an element buffer E ;
the second object uses buffers B and C with the same element buffer E.
Buffers A and B use the same layout and are both referenced by location 0 while C is referenced by location 1.
Initially, ARRAY_BUFFER_BINDING points to null while ELEMENT_ARRAY_BUFFER_BINDING points to E.
The redundancy checker outputs the following with (A, B, C, E) = (3, 6, 5, 2):
Which means that:
bindBuffer(ELEMENT_ARRAY_BUFFER, [Buffer 2]) is unnecessary
vertexAttribPointer(1, 2, FLOAT, false, 0, 0) could've been avoided
Since WebGL can directly read ELEMENT_ARRAY_BUFFER_BINDING to know where indices are stored, 1. makes sense to me.
However, 2. implies that the buffer layout is stored inside the VBO, which is wrong because Buffer A and B are not seen as redundant on lines 15 and 30. (Several frames were rendered already, so they should have kept their state)
I think I'm confused about how drawElements know what buffer to use and where/when buffer layouts are stored.
What is the optimal use of bindBuffer and vertexAttribPointer for this example case and why?
Actually I think I find out by simply looking at the source of the redundancy checker.
There are 2 important things to know :
Buffer layouts are bound per location and not per VBO.
vertexAttribPointer will also assign the current buffer to the given location
Internally, WebGL retains 6 parameters per location :
VERTEX_ATTRIB_ARRAY_SIZE_X
VERTEX_ATTRIB_ARRAY_TYPE_X
VERTEX_ATTRIB_ARRAY_NORMALIZED_X
VERTEX_ATTRIB_ARRAY_STRIDE_X
VERTEX_ATTRIB_ARRAY_POINTER_X
VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_X
Here's what vertexAttribPointer does :
function vertexAttribPointer(indx, size, type, normalized, stride, offset) {
this.stateCache["VERTEX_ATTRIB_ARRAY_SIZE_" + indx] = size;
this.stateCache["VERTEX_ATTRIB_ARRAY_TYPE_" + indx] = type;
this.stateCache["VERTEX_ATTRIB_ARRAY_NORMALIZED_" + indx] = normalized;
this.stateCache["VERTEX_ATTRIB_ARRAY_STRIDE_" + indx] = stride;
this.stateCache["VERTEX_ATTRIB_ARRAY_POINTER_" + indx] = offset;
this.stateCache["VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_" + indx] = this.stateCache["ARRAY_BUFFER_BINDING"];
}
Finally, WebGL Inspector was true! State changes line 15 and 30 are necessary because the VERTEX_ATTRIB_ARRAY_BUFFER_BINDING_0 is changing.
Here's the optimal trace :
bindBuffer(ARRAY_BUFFER, A)
vertexAttribPointer(0, 3, FLOAT, false, 0, 0)
drawElements(TRIANGLES, 768, UNSIGNED_BYTE, 0)
bindBuffer(ARRAY_BUFFER, B)
vertexAttribPointer(0, 3, FLOAT, false, 0, 0)
drawElements(TRIANGLES, 768, UNSIGNED_BYTE, 0)
(bindBuffer(ARRAY_BUFFER, C) is not needed anymore, since we aren't doing anything with it.)

What is the point in setting a slice's capacity?

In Golang, we can use the builtin make() function to create a slice with a given initial length and capacity.
Consider the following lines, the slice's length is set to 1, and its capacity 3:
func main() {
var slice = make([]int, 1, 3)
slice[0] = 1
slice = append(slice, 6, 0, 2, 4, 3, 1)
fmt.Println(slice)
}
I was surprised to see that this program prints:
[1 6 0 2 4 3 1]
This got me wondering- what is the point of initially defining a slice's capacity if append() can simply blow past it? Are there performance gains for setting a sufficiently large capacity?
A slice is really just a fancy way to manage an underlying array. It automatically tracks size, and re-allocates new space as needed.
As you append to a slice, the runtime doubles its capacity every time it exceeds its current capacity. It has to copy all of the elements to do that. If you know how big it will be before you start, you can avoid a few copy operations and memory allocations by grabbing it all up front.
When you make a slice providing capacity, you set the initial capacity, not any kind of limit.
See this blog post on slices for some interesting internal details of slices.
A slice is a wonderful abstraction of a simple array. You get all sorts of nice features, but deep down at its core, lies an array. (I explain the following in reverse order for a reason). Therefore, if/when you specify a capacity of 3, deep down, an array of length 3 is allocated in memory, which you can append up to without having it need to reallocate memory. This attribute is optional in the make command, but note that a slice will always have a capacity whether or not you choose to specify one. If you specify a length (which always exists as well), the slice be indexable up to that length. The rest of the capacity is hidden away behind the scenes so it does not have to allocate an entirely new array when append is used.
Here is an example to better explain the mechanics.
s := make([]int, 1, 3)
The underlying array will be allocated with 3 of the zero value of int (which is 0):
[0,0,0]
However, the length is set to 1, so the slice itself will only print [0], and if you try to index the second or third value, it will panic, as the slice's mechanics do not allow it. If you s = append(s, 1) to it, you will find that it has actually been created to contain zero values up to the length, and you will end up with [0,1]. At this point, you can append once more before the entire underlying array is filled, and another append will force it to allocate a new one and copy all the values over with a doubled capacity. This is actually a rather expensive operation.
Therefore the short answer to your question is that preallocating the capacity can be used to vastly improve the efficiency of your code. Especially so if the slice is either going to end up very large, or contains complex structs (or both), as the zero value of a struct is effectively the zero values of every single one of its fields. This is not because it would avoid allocating those values, as it has to anyway, but because append would have to reallocate new arrays full of these zero values each time it would need to resize the underlying array.
Short playground example: https://play.golang.org/p/LGAYVlw-jr
As others have already said, using the cap parameter can avoid unnecessary allocations. To give a sense of the performance difference, imagine you have a []float64 of random values and want a new slice that filters out values that are not above, say, 0.5.
Naive approach - no len or cap param
func filter(input []float64) []float64 {
ret := make([]float64, 0)
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Better approach - using cap param
func filterCap(input []float64) []float64 {
ret := make([]float64, 0, len(input))
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Benchmarks (n=10)
filter 131 ns/op 56 B/op 3 allocs/op
filterCap 56 ns/op 80 B/op 1 allocs/op
Using cap made the program 2x+ faster and reduced the number of allocations from 3 to 1. Now what happens at scale?
Benchmarks (n=1,000,000)
filter 9630341 ns/op 23004421 B/op 37 allocs/op
filterCap 6906778 ns/op 8003584 B/op 1 allocs/op
The speed difference is still significant (~1.4x) thanks to 36 fewer calls to runtime.makeslice. However, the bigger difference is the memory allocation (~4x less).
Even better - calibrating the cap
You may have noticed in the first benchmark that cap makes the overall memory allocation worse (80B vs 56B). This is because you allocate 10 slots but only need, on average, 5 of them. This is why you don't want to set cap unnecessarily high. Given what you know about your program, you may be able to calibrate the capacity. In this case, we can estimate that our filtered slice will need 50% as many slots as the original slice.
func filterCalibratedCap(input []float64) []float64 {
ret := make([]float64, 0, len(input)/2)
for _, el := range input {
if el > .5 {
ret = append(ret, el)
}
}
return ret
}
Unsurprisingly, this calibrated cap allocates 50% as much memory as its predecessor, so that's ~8x improvement on the naive implementation at 1m elements.
Another option - using direct access instead of append
If you are looking to shave even more time off a program like this, initialize with the len parameter (and ignore the cap parameter), access the new slice directly instead of using append, then throw away all the slots you don't need.
func filterLen(input []float64) []float64 {
ret := make([]float64, len(input))
var counter int
for _, el := range input {
if el > .5 {
ret[counter] = el
counter++
}
}
return ret[:counter]
}
This is ~10% faster than filterCap at scale. However, in addition to being more complicated, this pattern does not provide the same safety as cap if you try and calibrate the memory requirement.
With cap calibration, if you underestimate the total capacity required, then the program will automatically allocate more when it needs it.
With this approach, if you underestimate the total len required, the program will fail. In this example, if you initialize as ret := make([]float64, len(input)/2), and it turns out that len(output) > len(input)/2, then at some point the program will try to access a non-existent slot and panic.
Each time you add an item to a slice that has len(mySlice) == cap(mySlice), the underlying data structure is replaced with a larger structure.
fmt.Printf("Original Capacity: %v", cap(mySlice)) // Output: 8
mySlice = append(mySlice, myNewItem)
fmt.Printf("New Capacity: %v", cap(mySlice)) // Output: 16
Here, mySlice is replaced (through the assignment operator) with a new slice containing all the elements of the original mySlice, plus myNewItem, plus some room (capacity) to grow without triggering this resize.
As you can imagine, this resizing operation is computationally non-trivial.
Quite often, all the resize operations can be avoided if you know how many items you will need to store in mySlice. If you have this foreknowledge, you can set the capacity of the original slice upfront and avoid all the resize operations.
(In practice, it's quite often possible to know how many items will be added to a collection; especially when transforming data from one format to another.)

Slices in Go: why does it allow appending more than the capacity allows?

The capacity parameter in making a slice in Go does not make much sense to me. For example,
aSlice := make([]int, 2, 2) //a new slice with length and cap both set to 2
aSlice = append(aSlice, 1, 2, 3, 4, 5) //append integers 1 through 5
fmt.Println("aSlice is: ", aSlice) //output [0, 0, 1, 2, 3, 4, 5]
If the slice allows inserting more elements than the capacity allows, why do we need to set it in the make() function?
The builtin append() function uses the specified slice to append elements to if it has a big enough capacity to accomodate the specified elements.
But if the passed slice is not big enough, it allocates a new, big enough slice, copies the elements from the passed slice to the new slice and append the elements to that new slice. And returns this new slice. Quoting from the append() documentation:
The append built-in function appends elements to the end of a slice. If it has sufficient capacity, the destination is resliced to accommodate the new elements. If it does not, a new underlying array will be allocated. Append returns the updated slice. It is therefore necessary to store the result of append, often in the variable holding the slice itself:
When making a slice with make if the length and capacity are the same, the capacity can be omitted, in which case it is defaulted to the specified length:
// These 2 declarations are equivalent:
s := make([]int, 2, 2)
s := make([]int, 2)
Also note that append() appends elements after the last element of the slice. And the above slices already have len(s) == 2 right after declaration so if you append even just 1 element to it, it will cause a reallocation as seen in this example:
s := make([]int, 2, 2)
fmt.Println(s, len(s), cap(s))
s = append(s, 1)
fmt.Println(s, len(s), cap(s))
Output:
[0 0] 2 2
[0 0 1] 3 4
So in your example what you should do is something like this:
s := make([]int, 0, 10) // Create a slice with length=0 and capacity=10
fmt.Println(s, len(s), cap(s))
s = append(s, 1)
fmt.Println(s, len(s), cap(s))
Output:
[] 0 10
[1] 1 10
I recommend the following blog articles if you want to understand slices in more details:
Go Slices: usage and internals
Arrays, slices (and strings): The mechanics of 'append'
It is mainly an optimization, and it is not unique to go, similar structures in other languages have this as well.
When you append more than the capacity, the runtime needs to allocate more memory for the new elements. This is costly and can also cause memory fragmentation.
By specifying the capacity, the runtime allocates what is needed in advance, and avoids reallocations. However if you do not know the estimated capacity in advance or it changes, you do not have to set it, and the runtime reallocates what is needed and grows the capacity itself.

golang slice, slicing a slice with slice[a:b:c]

I read go slice usage and internals and Slice and Effective go#slice but there is nothing about slicing a slice with 3 number like this : slice[a:b:c]
For example this code :
package main
import "fmt"
func main() {
var s = []string{"a", "b", "c", "d", "e", "f", "g"}
fmt.Println(s[1:2:6], len(s[1:2:6]), cap(s[1:2:6]))
fmt.Println(s[1:2:5], len(s[1:2:5]), cap(s[1:2:5]))
fmt.Println(s[1:2], len(s[1:2]), cap(s[1:2]))
}
go playground result is this :
[b] 1 5
[b] 1 4
[b] 1 6
I can understand that the third one is something about capacity, but what is the exact meaning of this?
Do I miss something in documents?
The syntax has been introduced in Go 1.2, as I mentioned in "Re-slicing slices in Golang".
It is documented in Full slice expressions:
a[low : high : max]
constructs a slice of the same type, and with the same length and elements as the simple slice expression a[low : high].
Additionally, it controls the resulting slice's capacity by setting it to max - low.
Only the first index may be omitted; it defaults to 0.
After slicing the array a:
a := [5]int{1, 2, 3, 4, 5}
t := a[1:3:5]
the slice t has type []int, length 2, capacity 4, and elements
t[0] == 2
t[1] == 3
The design document for that feature had the following justification:
It would occasionally be useful, for example in custom []byte allocation managers, to be able to hand a slice to a caller and know that the caller cannot edit values beyond a given subrange of the true array.
The addition of append to the language made this somewhat more important, because append lets programmers overwrite entries between len and cap without realizing it or even mentioning cap.
2022: svitanok adds for Go 1.19+:
while the capacity of a "derivative" slice doesn't exceed the one specified by the third index during its creation the slice is still "from" the same spot in the memory as its original ("true") slice, so the changes applied to it will affect the original slice.
And if then, for example, you append to this derivative slice the amount of elements that would cause its capacity to be increased, this new slice will occupy a different place in the memory, and so the changes made to it will not affect the slice it originated from.
In a slice expression slice[a:b:c] or aSlice[1:3:5]
a:b or 1:3 -> gives length
a:c or 1:5 -> gives capacity
We can extract both length and capacity from a slice expression with 3 numbers/indices, without looking at the source slice/array.
expression| aSlice[low:high:max] or aSlice[a:b:c] or aSlice[1:3:7]
------------------------------------------------------------------------
Length | len(aSlice[low:high]) or len(aSlice[a:b]) or len(aSlice[1:3])
Capacity | len(aSlice[low:max]) or len(aSlice[a:c]) or len(aSlice[1:7])
------------------------------------------------------------------------
Read more here at Slice Expressions
Playground
Actually Go slice have a pointer and pointing to the array and it holds length and capacity of the array and we can show it like will be
pointer:length:capacity
and append is used for adding same new length.
sl1 := make([]int, 6)
fmt.Println(sl1)
sl2 := append(sl1, 1)
fmt.Println(sl2)
[0 0 0 0 0 0]
[0 0 0 0 0 0 1]

Resources