Performance of using a large number of small VBOs - performance

I wanted to know whether OpenGL VBOs are meant to be used only for large geometry arrays, or whether it makes sense to use them even for small arrays. I have code that positions a variety of geometry relative to other geometry, but some of the "leaf" geometry objects are quite small, 10x10 quad spheres and the like (200 triangles apiece). I may have a very large number of these small leaf objects. I want to be able to have a different transform matrix for each of these small leaf objects. It seems I have 2 options:
Use a separate VBO for each leaf object. I may end up with a large number of VBOs.
I could store data for multiple objects in one VBO, and apply the appropriate transforms myself when changing the data for a given leaf object. This seems strange because part of the point of OpenGL is to efficiently do large numbers of matrix ops in hardware. I would be doing in software something OpenGL is designed to do very well in hardware.
Is having a large number of VBOs inefficient, or should I just go ahead and choose option 1? Are there any better ways to deal with the situation of having lots of small objects? Should I instead be using vertex arrays or anything like that?

The most optimal solution if your data is static and consists of one object would be to use display lists and one VBO for only one mesh. Otherwise the general rule is that you want to avoid doing anything other than rendering in the main draw loop.
If you'll never need to add or remove an object after initialization (or morph any object) then it's probably more efficient to bind a single buffer permanently and change the stride/offset values to render different objects.
If you've only got a base set of geometry that will remain static, use a single VBO for that and separate VBOs for the geometry that can be added/removed/morphed.
If you can drop objects in or remove objects at will, each object should have it's own VBO to make memory management much simpler.

some good info is located at: http://www.opengl.org/wiki/Vertex_Specification_Best_Practices
I think that 200 triangles per one mesh is not so small number and maybe the performance with VBO for each of that meshes will not decrease so much. Unfortunately it is depended on hardware spec.
One huge buffer will no gain huge performance difference... I think that the best option would be to store several (but not all) objects per one VBO.
renderin using one buffer:
there is no problem with that... you simply have one buffer that is bound and then you can use glDrawArrays with different parameters. For instance if one mesh consists of 100 verts, and in th buffer you have 10 of those meshes you can use
glDrawArrays(triangles, 0, 100);
glDrawArrays(triangles, 100, 100);
glDrawArrays(triangles, ..., 100);
glDrawArrays(triangles, 900, 100);
in that way you minimize changing buffers and still you are able to render it quite efficient.
Are those "small" objects the same? do they have the same geometry but different transformations/materials? Because maybe it is worth using "instancing"?

Related

Implementing imposters in three.js

Is there a way to do imposters in three.js - or is that not going to help with performance at all for a scene with >10,000 objects most of them being the same model?
If you have thousands of the same object (with variations of position/size/rotation and perhaps color) then your first priority should be to make sure you don't have thousands of GPU draw call. A couple options:
(a) static batching — apply the objects' positions to their geometries (geometry.applyMatrix( mesh.matrixWorld )) then merge them with THREE.BufferGeometryUtils.mergeBufferGeometries()). The result can be drawn as a single large mesh. This takes up more memory, but is easier to set up.
(b) gpu instancing — more memory-efficient, but harder to do. See https://threejs.org/examples/webgl_interactive_instances_gpu.html or https://www.npmjs.com/package/three-instanced-mesh.
Once you've reduced the number of draw calls, profile the application again. If performance is still poor, you can reduce the total vertex count with impostors (or, really, just simpler meshes...). threejs does not generate impostors for you, per Spherical Impostors in three.js.

Very fast boolean difference between two meshes

Let's say I have a static object and a movable object which can be moved and rotated, what is the best way to very quickly calculate the difference of those two meshes?
Precision here is not so important, speed is though, since I have to use it in the update phase of the main loop.
Maybe, given the strict time limit, modifying the static object's vertices and triangles directly is to be preferred. Should voxels be preferred here instead?
EDIT: The use case is an interactive viewer of a wood panel (parallelepiped) and a milling tool (a revolved contour, some like these).
The milling tool can be rotated and can work oriented at varying degrees (5 axes).
EDIT 2: The milling tool may not pierce the wood.
EDIT 3: The panel can be as large as 6000x2000mm and the milling tool can be as little as 3x3mm.
If you need the best possible performance then the generic CSG approach may be too slow for you (but still depending on meshes and target hardware).
You may try to find some specialized algorithm, coded for your specific meshes. Let's say you have two cubes - one is a 'wall' and second is a 'window' - then it's much easier/faster to compute resulting mesh with your custom code, than full CSG. Unfortunately you don't say anything about your meshes.
You may also try to make it a 2D problem, use some simplified meshes to compute the result that will 'look like expected'.
If the movement of your meshes is somehow limited you may be able to precompute full or partial results for different mesh combinations to use at runtime.
You may use some space partitioning like BSP or Octrees to divide your meshes during precomputing stage. This way you could split one big problem into many smaller ones that may be faster to compute or at least to make the solution multi-threaded.
You've said about voxels - if you're fine with their look and limits you may voxelize both meshes and just read and mix two voxel values, instead of one. Then you would triangulate it using algorithm like Marching Cubes.
Those are all just some general ideas but we'll need better info to help you more.
EDIT:
With your description it looks like you're modeling some bas-relief, so you may use Relief Mapping to fake this effect. It's based on a height map stored as a texture, so you'd need to just update few pixels of the texture and render a plane. It should be quite fast compared to other approaches, the downside is that it's based on height map, so you can't get shapes that Tee Slot or Dovetail cutter would create.
If you want the real geometry then I'd start from a simple plane as your panel (don't need full 3D yet, just a front surface) and divide it with a 2D grid. The grid element should be slightly bigger than the drill size and every element is a separate mesh. In the frame update you'd cut one, or at most 4 elements that are touched with a drill. Thanks to this grid all your cutting operations will be run with very simple mesh so they may work with your intended speed. You can also cut all current elements in separate threads. After the cutting is done you'll upload to the GPU only currently modified elements so you may end up with quite complex mesh but small modifications per frame.

How can I properly manage data in modern OpenGL while considering performance?

In modern OpenGL (3.x+), you create buffer objects which contain vertex attributes, such as positions, colors, normals, texture coordinatess, & indices.
These buffers are then assigned to a corresponding vertex array object (VAO) which essentially contains pointers to all of the data as well as the data's format.
There are many tutorials out there for how to create a VAO and how to use it; unfortunately, it isn't clear how VAO's should be used for larger applications or games.
For example, a game might contain many 3D models, and it seems appropriate to separate each model by a different VAO.
On the other hand, a particle system contains many disconnected primitives traveling independent of one another. In this scenario, using a single VAO per system might improve performance in CPU-GPU transfers. However, in this case, the primitives need to be translated differently than one another, so it might seem viable to separate each particle into a very tiny VAO.
Question:
For a large quantity of small data sets (such as a particle system of quads), should all of the data be packed into 1 VAO or divided into many VAO's? What are the performance benefits/drawbacks in each method?
Assumming 1 VAO is used, the only apparent way to translate each independent sub-unit of data is to modify the actual position information and reload it into the GPU. Doing this many times is costly in terms of time performance.
Assuming many VAO's are used, then the GPU must store duplicate formatting information for each VAO. This seems to be costly in terms of space (but I'm not sure if this is necessarily slow).
Side-Note:
Yes, I'm personally interested in managing a particle system. To keep this question more generic, and more useful for others, I am asking about VAO management as a whole. I am curious what management methods are more suitable vs others when considering the type of data being stored and when considering what type of performance is desired (time/space).
VAO creation is described well here:
https://www.opengl.org/wiki/Vertex_Specification
In the case of particles it would be best to use instanced rendering - where you can render all the particles in a single draw call but assign a different position for each one as an attribute. You can update an existing buffer using glSubData. That way you could update the position on the CPU side between frames, and then update the buffer.
In more complex examples you can instance whichever attributes you want to.
The way I call instanced rendering and set it up in my code is as follows:
void CreateInstancedAttrib(unsigned int attribNum,GLuint VAO,GLuint& posVBO,int numInstances){
glBindVertexArray(VAO);
posVBO = CreateVertexArrayBuffer(0, sizeof(vec3),numInstances,GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(attribNum);
glVertexAttribPointer(attribNum, 3, GL_FLOAT, GL_FALSE, sizeof(vec3), 0);
glVertexAttribDivisor(attribNum, 1);
glBindVertexArray(0);
}
Where posVBO is the usual attrib data and the lines following set up the buffer for positions.
When rendering:
void RenderInstancedStaticMesh(const StaticMesh& mesh, MaterialUniforms& uniforms,const vec3* positions){
for (unsigned int meshNum = 0; meshNum < mesh.m_numMeshes; meshNum++){
if (mesh.m_meshData[meshNum]->m_hasTexture){
glBindTexture(GL_TEXTURE_2D, mesh.m_meshData[meshNum]->m_texture);
}
glBindVertexArray(mesh.m_meshData[meshNum]->m_vertexBuffer);
glBindBuffer(GL_ARRAY_BUFFER, mesh.m_meshData[meshNum]->m_instancedDataBuffer);
glBufferSubData(GL_ARRAY_BUFFER,0, sizeof(vec3) * mesh.m_numInstances, positions);
glUniform3fv(uniforms.diffuseUniform, 1, &mesh.m_meshData[meshNum]->m_material.diffuse[0]);
glUniform3fv(uniforms.specularUniform, 1, &mesh.m_meshData[meshNum]->m_material.specular[0]);
glUniform3fv(uniforms.ambientUniform, 1, &mesh.m_meshData[meshNum]->m_material.ambient[0]);
glUniform1f(uniforms.shininessUniform, mesh.m_meshData[meshNum]->m_material.shininess);
glDrawElementsInstanced(GL_TRIANGLES, mesh.m_meshData[meshNum]->m_numFaces * 3,
GL_UNSIGNED_INT, 0,mesh.m_numInstances);
}
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
}
That's a lot to take in but the important lines are DrawElementsInstance and glBufferSubData.
If you do a few googles on both functions I'm sure you will come to understand how instanced rendering works.
Anymore questions please ask
The general rule is, that you want to minimize the amount of draw calls. If you put things into individual VAOs you have to perform a draw call for each VAO. Also switching between VAOs and VBOs comes with a cost either. Don't think of VAOs and VBOs as "model" containsers, but as memory pools, where each VBO / VAO should be used to coalesce data of identical properties.
A particle system is the perfect candidate to put everything into a single VBO/VAO. In the usual case using instanced rendering where the VBO contain information about where to place each particle.

Sharing VBOs across multiple mesh objects

I'm working on a very small game engine that uses OpenGL ES 2.0. I'm having a bit of a design issue with integrating VBOs into my Mesh Class.
The problem is that I don't want to instantiate a new VBO for each mesh, and I want the VBO size to be determined by the number of meshes I load into it (not just a fixed size of 2MB or something).
Since there's no realloc function for VBOs, I need to batch load all my vertex data at once. This is ok, since I only have 4 or 5 small meshes. So I created a MeshList class.
I call MeshList.AddMesh(Mesh mesh) and it aggregates the vertex/index data of the mesh object and returns the offsets into the array of vertex data/index data back to the mesh that was added. This way the mesh knows where it is in the VBO (but not which VBO it's in).
However, none of the MeshList data is uploaded into a VBO until I call MeshList.BindToVBO(). But now, none of my meshes know which VBO they're in. So I was thinking of creating an array of pointers in MeshList that point to integer member variables in each Mesh class that would hold the VBO Handle. This way, when BindToVBO() is called, it iterates over the pointer array and updates the VBO Handles in the mesh objects.
I figured, this way it gives me the flexibility of having different mesh objects in different VBOs or all in one VBO. The only concern I have is whether or not this is a good design.
It's not clear to someone glancing at the code that MeshList.BindToVBO() is updating a whole bunch of mesh objects. I mean, MeshList does interact with all of the Mesh objects prior to the BindToVBO() call, but there's nothing explicitly saying that by passing a Mesh object to MeshList.AddMesh(), it's essentially subscribing it's VBOHandle members to updates at some point in the future.
I've tried to make this as clear as I can. Let me know if something needs clarification.
Honestly to me sounds like a lot of trouble for a dubious payoff. Do you have a reason to believe that putting multiple meshes in the same buffer is going to make a noticeable in your performance?
It sounds like premature optimization to me.
Sure, if you have a particle system with 50,000 particles I could see wanting that to be in a shared buffer, but in general I don't know if there's a benefit to storing two arbitrary meshes in the same buffer. It just sounds like a huge potential for bugs and headaches.

How to implement batches using webgl?

I am working on a small game using webgl. Within this game I have some kind of forest which consists out of many (100+) tree objects. Because I only have a few different tree models, I rotate and scale these models in a different way before I display them.
At the moment I loop over all trees to display them:
for (var tree in trees) {
tree.display();
}
While the display() method of tree looks like:
display : function() { // tree
this.treeModel.setRotation(this.rotation);
this.treeModel.setScale(this.scale);
this.treeModel.setPosition(this.position);
this.treeModel.display();
}
Many tree objects share the same treeModel object, so I have to set rotation/scale/position of the model everytime before I display it. The rotation/scale/position values are different for every tree.
The display method of treeModel does all the gl stuff:
display : function() { // treeModel
// bind texture
// set uniforms for projection/modelview matrix based on rotation/scale/position
// bind buffers
// drawArrays
}
All tree models use the same shader but can use different textures.
Because a single tree model consists only out of a few triangles I want to combine all trees into one VBO and display the whole forest with one drawArrays() call.
Some assumptions to make talking about numbers easier:
There are 250 trees to display
There are 5 different tree models
Every tree model has 50 triangles
Questions I have:
At the moment I have 5 buffers that are 50 * 3 * 8 (position + normal + texCoord) * floatSize bytes large. When i want to display all trees with one vbo i would have a buffer with 250 * 50 * 3 * 8 * floatSize byte size. I think I can't use an index buffer because I have different position values for every tree (computed out of the position value of the tree model and the tree position/scale/rotation). Is this correct or is there still a way I can use index buffers to reduce the buffer size at least a bit? Maybe there are other ways to optimize this?
How to handle different textures of the tree models? I can bind all textures to different texture units but how can I decide within the shader which texture should be used for the fragment that is currently displayed?
When I want to add a new tree (or any other kind of object) to this buffer at runtime: Do I have to create a new buffer and copy the content? I think new values can't be added by using glMapBuffer. Is this correct?
Index element buffers can only reach over attributes that are equal to or below 65535 in length, so you need to use drawArrays instead. It's usually not a big loss.
You can add trees to the end of the buffers using GL.bufferSubData.
If your textures are in reasonable sized (like 128x128 or 256x256), you can probably merge them into one big texture and handle the whole thing with the UV-coords. If not, you can add another attribute saying what texture the vertex belongs to and have a condition in the vertex shader, alternatively an array of sampler2Ds (not sure it works, never tried it). Remember that conditions in shaders are pretty slow.
If you decide to stick to your current solution, make sure to sort the trees so the once using the same textures are rendered after each other - keeping state switching down is essential, always.
A few thoughts:
Once you plant a tree in your world, do you ever modify it? Will it animate at all? Or is it just static geometry? If it's truly static, you could always build a single buffer with several copies of each tree. As you append trees, first apply (in Javascript) that instance's world transform to the vertices. If using triangle strips, you can link trees together using degenerate polygons.
You could roll your own pseudo-instanced drawing:
Encode an instance ID in the array buffer. Just set this to the same value for all vertices that are part of the same tree instance. I seem to recall that you can't have non-floaty vertex attributes in ES GLSL (maybe that's a Chrome limitation), so you will need to bring it in as a float but use it as an int. Since it's coming in as a float, you will have to deal with the fact that it's interpolated across your triangle, and so the value will have minor fluctuations - but simply rounding to the nearest integer fixes that right up.
Use a separate texture (which I will call the data texture) to encode all the per-instance information. In your vertex shader, look at the instance ID of the current vertex and use that to compute a texture coordinate in the data texture. Pull out whatever you need to transform the current vertex, and apply it. I think this is called a "dependent texture read", which is generally frowned upon because it can cause performance issues, but it might help you batch your geometry, which can help solve performance issues. If you're interested, you'll have to try it and see what happens.
Hope for an extension to support real instanced drawing.
Your current approach isn't so bad. I'd say: Stick with it until you hit some wall.
50 triangles is already a reasonable batch size for a single drawElements/drawArrays call. It's not optimal, but also not so bad. So for every tree change the paramters like location, texture and maybe shape through uniforms. Then do a draw call for each tree. Also a total of 250 drawElements calls isn't so bad either.
So I'd use one single VBO that contains all the used tree geometry variants. I'd actually split up the trees into building blocks, so that I could recombine them for added variety. And for each tree set appropriate offsets into the VBO before calling drawArrays or drawElements.
Also don't forget that you can do a very cheap field of view culling of each tree.

Resources