I have different meshes with different VBOs, some may have normals, some not, etc. Every mesh also has its VAO with all VBOs being bound.
Then I draw all meshes with instancing. I plan to use a shared global VBO of mat4 to store dynamically calculated transformations every frame. Every VAO also needs to point at this shared VBO additionally. Also the number of every mesh instances may vary.
But I guess we want to reduce the amount of data uploading commands to GPU, that's why I want to accumulate all the matrices in a contiguous memory and send it in a single glBufferSubData command.
Different batches of different instanced meshes want to use different segments of the shared VBO to read the matrices from. So I need to update VAOs each frame as well.
The question is: how should I perform this in a better way? And is such an architecture actually a good one? I guess I should use glBindVertexBuffer for the shared VBO on each VAO, so I update the offset and size of the segments, and VAOs are lightweight, but is it really a standard solution?
You should not be concerned with updating VAOs. In fact, you should not have one VAO per mesh at all; have one VAO per vertex format (aka: the stuff set by glVertexAttribFormat and glEnable/DisableVertexAttrib), and try to make all of your meshes use the same vertex format. Setting buffer binding state is much cheaper than setting vertex format state.
So the idea ought to be that you bind a VAO for a vertex format, then iterate through all of the objects that use that format, using glBindVertexBuffer as needed for their individual data.
Related
In modern OpenGL (3.x+), you create buffer objects which contain vertex attributes, such as positions, colors, normals, texture coordinatess, & indices.
These buffers are then assigned to a corresponding vertex array object (VAO) which essentially contains pointers to all of the data as well as the data's format.
There are many tutorials out there for how to create a VAO and how to use it; unfortunately, it isn't clear how VAO's should be used for larger applications or games.
For example, a game might contain many 3D models, and it seems appropriate to separate each model by a different VAO.
On the other hand, a particle system contains many disconnected primitives traveling independent of one another. In this scenario, using a single VAO per system might improve performance in CPU-GPU transfers. However, in this case, the primitives need to be translated differently than one another, so it might seem viable to separate each particle into a very tiny VAO.
Question:
For a large quantity of small data sets (such as a particle system of quads), should all of the data be packed into 1 VAO or divided into many VAO's? What are the performance benefits/drawbacks in each method?
Assumming 1 VAO is used, the only apparent way to translate each independent sub-unit of data is to modify the actual position information and reload it into the GPU. Doing this many times is costly in terms of time performance.
Assuming many VAO's are used, then the GPU must store duplicate formatting information for each VAO. This seems to be costly in terms of space (but I'm not sure if this is necessarily slow).
Side-Note:
Yes, I'm personally interested in managing a particle system. To keep this question more generic, and more useful for others, I am asking about VAO management as a whole. I am curious what management methods are more suitable vs others when considering the type of data being stored and when considering what type of performance is desired (time/space).
VAO creation is described well here:
https://www.opengl.org/wiki/Vertex_Specification
In the case of particles it would be best to use instanced rendering - where you can render all the particles in a single draw call but assign a different position for each one as an attribute. You can update an existing buffer using glSubData. That way you could update the position on the CPU side between frames, and then update the buffer.
In more complex examples you can instance whichever attributes you want to.
The way I call instanced rendering and set it up in my code is as follows:
void CreateInstancedAttrib(unsigned int attribNum,GLuint VAO,GLuint& posVBO,int numInstances){
glBindVertexArray(VAO);
posVBO = CreateVertexArrayBuffer(0, sizeof(vec3),numInstances,GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(attribNum);
glVertexAttribPointer(attribNum, 3, GL_FLOAT, GL_FALSE, sizeof(vec3), 0);
glVertexAttribDivisor(attribNum, 1);
glBindVertexArray(0);
}
Where posVBO is the usual attrib data and the lines following set up the buffer for positions.
When rendering:
void RenderInstancedStaticMesh(const StaticMesh& mesh, MaterialUniforms& uniforms,const vec3* positions){
for (unsigned int meshNum = 0; meshNum < mesh.m_numMeshes; meshNum++){
if (mesh.m_meshData[meshNum]->m_hasTexture){
glBindTexture(GL_TEXTURE_2D, mesh.m_meshData[meshNum]->m_texture);
}
glBindVertexArray(mesh.m_meshData[meshNum]->m_vertexBuffer);
glBindBuffer(GL_ARRAY_BUFFER, mesh.m_meshData[meshNum]->m_instancedDataBuffer);
glBufferSubData(GL_ARRAY_BUFFER,0, sizeof(vec3) * mesh.m_numInstances, positions);
glUniform3fv(uniforms.diffuseUniform, 1, &mesh.m_meshData[meshNum]->m_material.diffuse[0]);
glUniform3fv(uniforms.specularUniform, 1, &mesh.m_meshData[meshNum]->m_material.specular[0]);
glUniform3fv(uniforms.ambientUniform, 1, &mesh.m_meshData[meshNum]->m_material.ambient[0]);
glUniform1f(uniforms.shininessUniform, mesh.m_meshData[meshNum]->m_material.shininess);
glDrawElementsInstanced(GL_TRIANGLES, mesh.m_meshData[meshNum]->m_numFaces * 3,
GL_UNSIGNED_INT, 0,mesh.m_numInstances);
}
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
}
That's a lot to take in but the important lines are DrawElementsInstance and glBufferSubData.
If you do a few googles on both functions I'm sure you will come to understand how instanced rendering works.
Anymore questions please ask
The general rule is, that you want to minimize the amount of draw calls. If you put things into individual VAOs you have to perform a draw call for each VAO. Also switching between VAOs and VBOs comes with a cost either. Don't think of VAOs and VBOs as "model" containsers, but as memory pools, where each VBO / VAO should be used to coalesce data of identical properties.
A particle system is the perfect candidate to put everything into a single VBO/VAO. In the usual case using instanced rendering where the VBO contain information about where to place each particle.
I'm new to OpenGL ES and am using a VBO to speed up the rendering of a large number of objects. Currently I query a quadtree to determine which objects should appear in the viewport and use that data to update my VBO, like so:
glBufferData(
GL_ARRAY_BUFFER,
numObjects*sizeof(ObjectStruct),
objects,
GL_STREAM_DRAW
);
But is this the most efficient way of handling things? Should I instead avoid querying the quadtree and simply generate a static VBO containing information on all of my objects. If I do this, will GL be intelligent enough to cull objects which are positioned outside of the viewport?
GL won't know what's outside of the view frustum until vertex transformations have been executed. Objects outside of the view frustum will be culled to avoid redundant operations later in the pipeline. This does however mean that the driver and GPU have redundantly processed draws that will have no affect on the final image.
I'd recommend avoiding regular VBO updates, as each will either cause the graphics driver to stall or cause an additional memory allocations under the hood (enabling the GPU to continue rendering previous frames without interruption). I've written a blog post on the subject which may help: http://blog.imgtec.com/powervr/how-to-improve-your-renderer-on-powervr-based-platforms
The best approach depends on your use case. Ideally, you would batch a number of objects within a quadtree/octree node into a single draw. Doing so would enable your engine to reduce the amount of rendering work submitted without incurring the cost of VBO modifications
I'm making a WebGL game and eventually came up with a pretty convenient concept of object templates, when the game objects of the same kind (say, characters of the same race) are using the same template (which means: buffers, attributes and shader program), and are instanced from that template by specifying a set of uniforms (which are, in fact, the most common difference between the same-kind objects: model matrix, textures, bones positions, etc). For making independent objects with their own deep-copy of buffers, I just deep-copy and re-initialize the original template and start instantiating new objects from it.
But after that I started having doubts. Say, if I start using morphing on objects, by explicit editing of the vertices, this approach will require me to make a separate template for every object of such kind (otherwise, they would start morphing in exactly the same phase). Which is probably fine for this very case, 'cause I'll most likely need to recalculate normals and even texture coordinates, which means – most of the buffers.
But what if I'm missing some very common case of using attributes, say, blood decals, which will require me to update only a small piece of the buffer? In that case, it would be much more reasonable to have two buffers for each object: a common one that is shared by them all and the one for blood decals, which is unique for every single of them. And, as blood is usually spilled on everything, this sounds pretty reasonable, so that we would save a lot of space by storing vertices, normals and such without their unnecessary duplication.
I haven't tried implementing decals yet, so honestly not even sure if implementing them using vertex painting (textured or not) is the right choice. But I'm also pretty sure there are some commonly used attributes aside from vertices, normals and texture coordinates.
Here are some that I managed to come up with myself:
decals (probably better to be modelled as separate objects?)
bullet holes and such (same as decals maybe?)
Any thoughts?
UPD: as all this might sound confusing, I want to clarify: I do understand that using as few buffers as possible is a good thing, this is exactly why I'm trying to use this templates concept. My question is: what are the possible cases when using a single buffer and a single element buffer (with both of them shared between similar objects) for a template is going to stab me in the back?
Keeping a giant chunk of data that won't change on the card is incredibly useful for saving bandwidth. Additionally, you probably won't be directly changing the vertices positions once they are on the card. Instead you will probably morph them with passed in uniforms in the Vertex shader through Skeletal animation. Read about it here: Skeletal Animation
Do keep in mind though, that in Key frame animation with meshes, you would keep a bunch of buffers on the card each in a different key frame pose of the animation. However, you would then load whatever two key frames you want to interpolate over in as attributes and then blend between them (You can have more than two). Keyframe Animation
Additionally, with the introduction of Transformation Feedback, (No you don't get to use it in WebGL, it became core in OpenGL 3.0, WebGL is based on OpenGL ES 2.0, which is based on OpenGL 2.0) you can start keeping calculated data GPU side. In other words, you can do a giant particle system simulation in the vertex or geometry shader and then store the calculated data into another buffer, then use that buffer in the next frame without having to have a round trip from the GPU to CPU Read about them here: Transform Feedback and here: Transform Feedback how to
In general, you don't want to touch buffers once they are on the card, especially every frame. Instead load several and use pointers to that data in shaders as attributes.
I'm working on a very small game engine that uses OpenGL ES 2.0. I'm having a bit of a design issue with integrating VBOs into my Mesh Class.
The problem is that I don't want to instantiate a new VBO for each mesh, and I want the VBO size to be determined by the number of meshes I load into it (not just a fixed size of 2MB or something).
Since there's no realloc function for VBOs, I need to batch load all my vertex data at once. This is ok, since I only have 4 or 5 small meshes. So I created a MeshList class.
I call MeshList.AddMesh(Mesh mesh) and it aggregates the vertex/index data of the mesh object and returns the offsets into the array of vertex data/index data back to the mesh that was added. This way the mesh knows where it is in the VBO (but not which VBO it's in).
However, none of the MeshList data is uploaded into a VBO until I call MeshList.BindToVBO(). But now, none of my meshes know which VBO they're in. So I was thinking of creating an array of pointers in MeshList that point to integer member variables in each Mesh class that would hold the VBO Handle. This way, when BindToVBO() is called, it iterates over the pointer array and updates the VBO Handles in the mesh objects.
I figured, this way it gives me the flexibility of having different mesh objects in different VBOs or all in one VBO. The only concern I have is whether or not this is a good design.
It's not clear to someone glancing at the code that MeshList.BindToVBO() is updating a whole bunch of mesh objects. I mean, MeshList does interact with all of the Mesh objects prior to the BindToVBO() call, but there's nothing explicitly saying that by passing a Mesh object to MeshList.AddMesh(), it's essentially subscribing it's VBOHandle members to updates at some point in the future.
I've tried to make this as clear as I can. Let me know if something needs clarification.
Honestly to me sounds like a lot of trouble for a dubious payoff. Do you have a reason to believe that putting multiple meshes in the same buffer is going to make a noticeable in your performance?
It sounds like premature optimization to me.
Sure, if you have a particle system with 50,000 particles I could see wanting that to be in a shared buffer, but in general I don't know if there's a benefit to storing two arbitrary meshes in the same buffer. It just sounds like a huge potential for bugs and headaches.
I wanted to know whether OpenGL VBOs are meant to be used only for large geometry arrays, or whether it makes sense to use them even for small arrays. I have code that positions a variety of geometry relative to other geometry, but some of the "leaf" geometry objects are quite small, 10x10 quad spheres and the like (200 triangles apiece). I may have a very large number of these small leaf objects. I want to be able to have a different transform matrix for each of these small leaf objects. It seems I have 2 options:
Use a separate VBO for each leaf object. I may end up with a large number of VBOs.
I could store data for multiple objects in one VBO, and apply the appropriate transforms myself when changing the data for a given leaf object. This seems strange because part of the point of OpenGL is to efficiently do large numbers of matrix ops in hardware. I would be doing in software something OpenGL is designed to do very well in hardware.
Is having a large number of VBOs inefficient, or should I just go ahead and choose option 1? Are there any better ways to deal with the situation of having lots of small objects? Should I instead be using vertex arrays or anything like that?
The most optimal solution if your data is static and consists of one object would be to use display lists and one VBO for only one mesh. Otherwise the general rule is that you want to avoid doing anything other than rendering in the main draw loop.
If you'll never need to add or remove an object after initialization (or morph any object) then it's probably more efficient to bind a single buffer permanently and change the stride/offset values to render different objects.
If you've only got a base set of geometry that will remain static, use a single VBO for that and separate VBOs for the geometry that can be added/removed/morphed.
If you can drop objects in or remove objects at will, each object should have it's own VBO to make memory management much simpler.
some good info is located at: http://www.opengl.org/wiki/Vertex_Specification_Best_Practices
I think that 200 triangles per one mesh is not so small number and maybe the performance with VBO for each of that meshes will not decrease so much. Unfortunately it is depended on hardware spec.
One huge buffer will no gain huge performance difference... I think that the best option would be to store several (but not all) objects per one VBO.
renderin using one buffer:
there is no problem with that... you simply have one buffer that is bound and then you can use glDrawArrays with different parameters. For instance if one mesh consists of 100 verts, and in th buffer you have 10 of those meshes you can use
glDrawArrays(triangles, 0, 100);
glDrawArrays(triangles, 100, 100);
glDrawArrays(triangles, ..., 100);
glDrawArrays(triangles, 900, 100);
in that way you minimize changing buffers and still you are able to render it quite efficient.
Are those "small" objects the same? do they have the same geometry but different transformations/materials? Because maybe it is worth using "instancing"?