Memory usage in BufferAttribute - three.js

// it's a newbie question, don't have much experience with WebGL
I'm trying to optimize memory usage (mostly to allow to use our site with mobile devices).
Our meshes are using BufferGeometry with several BufferAttribute instances (vertices, normals, colors, etc), and as I see under IE DevTools each BufferAttribute contains two major memory consuming fields:
array - data for the given buffer attribute.
buffer - WebGLBuffer, which contains copied data from the array field.
As I understand - in some situations the buffer gets recreated, and at that point the array data will be reused. But if all geometry is readonly - would it be safe to clean up array to save memory? Or are there other situations when WebGL buffer has to be recreated (say - user switched to another tab in browser, and all WebGL stuff should be recreated on return)?

As I understand it you need to keep the CPU data around in case the WebGL context is lost. In that case the GL objects all have to recreated from the CPU data.

Related

About the meaning of glInvalidateFramebuffer

I have a question about the general use of glInvalidateFramebuffer:
As far as I know, the purpose of glInvalidateFramebuffer is to "skip the store of framebuffer contents that are no longer needed". Its main purpose on tile based gpus is to get rid of depth and stencil contents if only color is needed after rendering. I do not understand why this is necessary. As far as I know if I render to an FBO then all of this data is stored in that FBO. Now if I do something with only the color contents or nothing with that FBO in a subsequent draw, why is the depth/stencil data accessed at all? It is supposedly stored somewhere and that eats bandwidth, but as far as I can tell it is already in FBOs GPU memory as the result of the render so when does that supposed expensive additional store operation happen?
There are supposedly expensive preservaton steps for FBO attachments but why are those necessary if the data is already in Gpu memory as result of the render?
Regards
Framebuffers in a tile-based GPU exist in two places - the version stored in main memory (which is persistent), and the working copy inside the GPU tile buffer (which only exists for the duration of that tile's fragment shading). The content of the tile buffer is written back to the buffer in main memory at the end of shading for each tile.
The objective of tile-based shading is to keep as much of the state inside that tile buffer, and avoid writing it back to main memory if it's not needed. This is important because main memory DRAM accesses are phenomenally power hungry. Invalidation at the end of each render pass tells the graphics stack that those buffers don't need to be persisted, so means the write back from tile buffer to main memory can be avoided.
I've written a longer article on it here if you want more detail:
https://developer.arm.com/solutions/graphics/developer-guides/understanding-render-passes/single-view
For non-tile-based GPUs the main use case seems to be using it as a lower cost version of a clear at the start of a render pass if you don't actually care about the starting color. It's likely there is little benefit to using it at the end of the render pass (but it should not hurt either).

Rendering only to a part of a texture

Can I bind a 2000x2000 texture to a color attachment in a FBO, and tell OpenGL to behave exactly as if the texture was smaller, let's say 1000x1000?
The point is, in my rendering cycle I need many (mostly small) intermediate textures to render to, but I need only 1 at a time. I am thinking that, rather than creating many smaller textures, I will have only 1 appropriately large, and I will bind it to an FBO at hand, tell OpenGL to render only to part of it, and save memory.
Or maybe I should be destroying/recreating those textures many times per frame? That would certainly save even more memory, but wouldn't that cause a noticeable slowdown?
Can I bind a 2000x2000 texture to a color attachment in a FBO, and
tell OpenGL to behave exactly as if the texture was smaller, let's say
1000x1000?
Yes, just set glViewport() to the region you want to render to, and remember to adjust glScissor() bounding regions if you are ever enabling scissor testing.
Or maybe I should be destroying/recreating those textures many times
per frame? That would certainly save even more memory, but wouldn't
that cause a noticeable slowdown?
Completely destroying and recreating a new texture object every frame will be slow because it will cause constant memory reallocation overhead, so definitely don't do that.
Having a pool of pre-allocated textures which you cycle though is fine though - that's a pretty common technique. You won't really save much in terms of memory storing a 2K*2K texture vs storing 4 separate 1K*1K textures - the total storage requirement is the same and the additional metadata overhead is tiny in comparison - so if keeping them separate is easier in terms of application logic I'd suggest doing that.

Best practices for generating GL VBO data

I'm new to OpenGL ES and am using a VBO to speed up the rendering of a large number of objects. Currently I query a quadtree to determine which objects should appear in the viewport and use that data to update my VBO, like so:
glBufferData(
GL_ARRAY_BUFFER,
numObjects*sizeof(ObjectStruct),
objects,
GL_STREAM_DRAW
);
But is this the most efficient way of handling things? Should I instead avoid querying the quadtree and simply generate a static VBO containing information on all of my objects. If I do this, will GL be intelligent enough to cull objects which are positioned outside of the viewport?
GL won't know what's outside of the view frustum until vertex transformations have been executed. Objects outside of the view frustum will be culled to avoid redundant operations later in the pipeline. This does however mean that the driver and GPU have redundantly processed draws that will have no affect on the final image.
I'd recommend avoiding regular VBO updates, as each will either cause the graphics driver to stall or cause an additional memory allocations under the hood (enabling the GPU to continue rendering previous frames without interruption). I've written a blog post on the subject which may help: http://blog.imgtec.com/powervr/how-to-improve-your-renderer-on-powervr-based-platforms
The best approach depends on your use case. Ideally, you would batch a number of objects within a quadtree/octree node into a single draw. Doing so would enable your engine to reduce the amount of rendering work submitted without incurring the cost of VBO modifications

What are the non-trivial use-cases of attributes in WebGL/OpenGL in general?

I'm making a WebGL game and eventually came up with a pretty convenient concept of object templates, when the game objects of the same kind (say, characters of the same race) are using the same template (which means: buffers, attributes and shader program), and are instanced from that template by specifying a set of uniforms (which are, in fact, the most common difference between the same-kind objects: model matrix, textures, bones positions, etc). For making independent objects with their own deep-copy of buffers, I just deep-copy and re-initialize the original template and start instantiating new objects from it.
But after that I started having doubts. Say, if I start using morphing on objects, by explicit editing of the vertices, this approach will require me to make a separate template for every object of such kind (otherwise, they would start morphing in exactly the same phase). Which is probably fine for this very case, 'cause I'll most likely need to recalculate normals and even texture coordinates, which means – most of the buffers.
But what if I'm missing some very common case of using attributes, say, blood decals, which will require me to update only a small piece of the buffer? In that case, it would be much more reasonable to have two buffers for each object: a common one that is shared by them all and the one for blood decals, which is unique for every single of them. And, as blood is usually spilled on everything, this sounds pretty reasonable, so that we would save a lot of space by storing vertices, normals and such without their unnecessary duplication.
I haven't tried implementing decals yet, so honestly not even sure if implementing them using vertex painting (textured or not) is the right choice. But I'm also pretty sure there are some commonly used attributes aside from vertices, normals and texture coordinates.
Here are some that I managed to come up with myself:
decals (probably better to be modelled as separate objects?)
bullet holes and such (same as decals maybe?)
Any thoughts?
UPD: as all this might sound confusing, I want to clarify: I do understand that using as few buffers as possible is a good thing, this is exactly why I'm trying to use this templates concept. My question is: what are the possible cases when using a single buffer and a single element buffer (with both of them shared between similar objects) for a template is going to stab me in the back?
Keeping a giant chunk of data that won't change on the card is incredibly useful for saving bandwidth. Additionally, you probably won't be directly changing the vertices positions once they are on the card. Instead you will probably morph them with passed in uniforms in the Vertex shader through Skeletal animation. Read about it here: Skeletal Animation
Do keep in mind though, that in Key frame animation with meshes, you would keep a bunch of buffers on the card each in a different key frame pose of the animation. However, you would then load whatever two key frames you want to interpolate over in as attributes and then blend between them (You can have more than two). Keyframe Animation
Additionally, with the introduction of Transformation Feedback, (No you don't get to use it in WebGL, it became core in OpenGL 3.0, WebGL is based on OpenGL ES 2.0, which is based on OpenGL 2.0) you can start keeping calculated data GPU side. In other words, you can do a giant particle system simulation in the vertex or geometry shader and then store the calculated data into another buffer, then use that buffer in the next frame without having to have a round trip from the GPU to CPU Read about them here: Transform Feedback and here: Transform Feedback how to
In general, you don't want to touch buffers once they are on the card, especially every frame. Instead load several and use pointers to that data in shaders as attributes.

Sharing VBOs across multiple mesh objects

I'm working on a very small game engine that uses OpenGL ES 2.0. I'm having a bit of a design issue with integrating VBOs into my Mesh Class.
The problem is that I don't want to instantiate a new VBO for each mesh, and I want the VBO size to be determined by the number of meshes I load into it (not just a fixed size of 2MB or something).
Since there's no realloc function for VBOs, I need to batch load all my vertex data at once. This is ok, since I only have 4 or 5 small meshes. So I created a MeshList class.
I call MeshList.AddMesh(Mesh mesh) and it aggregates the vertex/index data of the mesh object and returns the offsets into the array of vertex data/index data back to the mesh that was added. This way the mesh knows where it is in the VBO (but not which VBO it's in).
However, none of the MeshList data is uploaded into a VBO until I call MeshList.BindToVBO(). But now, none of my meshes know which VBO they're in. So I was thinking of creating an array of pointers in MeshList that point to integer member variables in each Mesh class that would hold the VBO Handle. This way, when BindToVBO() is called, it iterates over the pointer array and updates the VBO Handles in the mesh objects.
I figured, this way it gives me the flexibility of having different mesh objects in different VBOs or all in one VBO. The only concern I have is whether or not this is a good design.
It's not clear to someone glancing at the code that MeshList.BindToVBO() is updating a whole bunch of mesh objects. I mean, MeshList does interact with all of the Mesh objects prior to the BindToVBO() call, but there's nothing explicitly saying that by passing a Mesh object to MeshList.AddMesh(), it's essentially subscribing it's VBOHandle members to updates at some point in the future.
I've tried to make this as clear as I can. Let me know if something needs clarification.
Honestly to me sounds like a lot of trouble for a dubious payoff. Do you have a reason to believe that putting multiple meshes in the same buffer is going to make a noticeable in your performance?
It sounds like premature optimization to me.
Sure, if you have a particle system with 50,000 particles I could see wanting that to be in a shared buffer, but in general I don't know if there's a benefit to storing two arbitrary meshes in the same buffer. It just sounds like a huge potential for bugs and headaches.

Resources