I'm new to OpenGL ES and am using a VBO to speed up the rendering of a large number of objects. Currently I query a quadtree to determine which objects should appear in the viewport and use that data to update my VBO, like so:
glBufferData(
GL_ARRAY_BUFFER,
numObjects*sizeof(ObjectStruct),
objects,
GL_STREAM_DRAW
);
But is this the most efficient way of handling things? Should I instead avoid querying the quadtree and simply generate a static VBO containing information on all of my objects. If I do this, will GL be intelligent enough to cull objects which are positioned outside of the viewport?
GL won't know what's outside of the view frustum until vertex transformations have been executed. Objects outside of the view frustum will be culled to avoid redundant operations later in the pipeline. This does however mean that the driver and GPU have redundantly processed draws that will have no affect on the final image.
I'd recommend avoiding regular VBO updates, as each will either cause the graphics driver to stall or cause an additional memory allocations under the hood (enabling the GPU to continue rendering previous frames without interruption). I've written a blog post on the subject which may help: http://blog.imgtec.com/powervr/how-to-improve-your-renderer-on-powervr-based-platforms
The best approach depends on your use case. Ideally, you would batch a number of objects within a quadtree/octree node into a single draw. Doing so would enable your engine to reduce the amount of rendering work submitted without incurring the cost of VBO modifications
Related
I have an existing OpenGL ES 3.1 application that renders a scene to an FBO with color and depth/stencil attachment. It uses the usual methods for drawing (glBindBuffer, glDrawArrays, glBlend*, glStencil* etc. ). My task is now to create a depth-only pass that fills the depth attachment with the same values as the main pass.
My question is: What is the minimum number of steps necessary to achieve this and avoid the GPU doing superfluous work (unnecessary shader invocations etc.)? Is deactivating the color attachment enough or do I also have to set null shaders, disable blending etc. ?
I assume you need this before the main pass runs, otherwise you would just keep the main pass depth.
Preflight
Create specialized buffers which contain only the mesh data needed to compute position (which are deinterleaved from all non-position data).
Create specialized vertex shaders (which compute only the output position).
Link programs with the simplest valid fragment shader.
Rendering
Render the depth-only pass using the specialized buffers and shaders, masking out all color writes.
Render the main pass with the full buffers and shaders.
Options
At step (2) above it might be beneficial to load the depth-only pass depth results as the starting depth for the main pass. This will give you better early-zs test accuracy, at the expense of the readback of the depth value. Most mobile GPUs have hidden surface removal, so this isn't always going to be a net gain - it depends on your content, target GPU, and how good your front-to-back draw order is.
You probably want to use the specialized buffers (position data interleaved in one buffer region, non-position interleaved in a second) for the main draw, as many GPUs will optimize out the non-position calculations if the primitive is culled.
The specialized buffers and optimized shaders can also be used for shadow mapping, and other such depth-only techniques.
I'm combining the vertex data that has the same format in a single VBO, assigning vertex attributes based on a material that these object use and rendering them with a single glDrawArrays() call.
It is all working out great until I have to disable some objects (say object1) from being rendered at runtime. Is this even possible, assuming I've already set up all the vertex attributes and stuff? Would it be better not to use batching at all, and have vbo/vao per object (then, if an object's disabled, just don't call glDraw*() on it) ?
Batching requires putting all of your data in one buffer, but batching is not limited to that. Batching is about reducing the number of draw calls. Putting your data in one buffer is necessary for that, but not sufficient.
Putting all of your vertex data in one buffer alone has performance advantages, relative to having to switch buffers and vertex formats. You don't need to go all the way to batching everything into a single draw call to improve performance over using a buffer for each individual object.
In OpenGL, as discussed in this video, the primary cost of multiple draw calls isn't the draw call itself. It's the state changes you usually do between draw calls.
You've put your vertex data in the same buffer, and you must have managed to eliminate stage changes between objects if you could render everything with one draw call. At that point, you've already gained most of the performance you're going to. Accept that and move on to other lower-hanging fruit.
I'm making a WebGL game and eventually came up with a pretty convenient concept of object templates, when the game objects of the same kind (say, characters of the same race) are using the same template (which means: buffers, attributes and shader program), and are instanced from that template by specifying a set of uniforms (which are, in fact, the most common difference between the same-kind objects: model matrix, textures, bones positions, etc). For making independent objects with their own deep-copy of buffers, I just deep-copy and re-initialize the original template and start instantiating new objects from it.
But after that I started having doubts. Say, if I start using morphing on objects, by explicit editing of the vertices, this approach will require me to make a separate template for every object of such kind (otherwise, they would start morphing in exactly the same phase). Which is probably fine for this very case, 'cause I'll most likely need to recalculate normals and even texture coordinates, which means – most of the buffers.
But what if I'm missing some very common case of using attributes, say, blood decals, which will require me to update only a small piece of the buffer? In that case, it would be much more reasonable to have two buffers for each object: a common one that is shared by them all and the one for blood decals, which is unique for every single of them. And, as blood is usually spilled on everything, this sounds pretty reasonable, so that we would save a lot of space by storing vertices, normals and such without their unnecessary duplication.
I haven't tried implementing decals yet, so honestly not even sure if implementing them using vertex painting (textured or not) is the right choice. But I'm also pretty sure there are some commonly used attributes aside from vertices, normals and texture coordinates.
Here are some that I managed to come up with myself:
decals (probably better to be modelled as separate objects?)
bullet holes and such (same as decals maybe?)
Any thoughts?
UPD: as all this might sound confusing, I want to clarify: I do understand that using as few buffers as possible is a good thing, this is exactly why I'm trying to use this templates concept. My question is: what are the possible cases when using a single buffer and a single element buffer (with both of them shared between similar objects) for a template is going to stab me in the back?
Keeping a giant chunk of data that won't change on the card is incredibly useful for saving bandwidth. Additionally, you probably won't be directly changing the vertices positions once they are on the card. Instead you will probably morph them with passed in uniforms in the Vertex shader through Skeletal animation. Read about it here: Skeletal Animation
Do keep in mind though, that in Key frame animation with meshes, you would keep a bunch of buffers on the card each in a different key frame pose of the animation. However, you would then load whatever two key frames you want to interpolate over in as attributes and then blend between them (You can have more than two). Keyframe Animation
Additionally, with the introduction of Transformation Feedback, (No you don't get to use it in WebGL, it became core in OpenGL 3.0, WebGL is based on OpenGL ES 2.0, which is based on OpenGL 2.0) you can start keeping calculated data GPU side. In other words, you can do a giant particle system simulation in the vertex or geometry shader and then store the calculated data into another buffer, then use that buffer in the next frame without having to have a round trip from the GPU to CPU Read about them here: Transform Feedback and here: Transform Feedback how to
In general, you don't want to touch buffers once they are on the card, especially every frame. Instead load several and use pointers to that data in shaders as attributes.
I'm working on a very small game engine that uses OpenGL ES 2.0. I'm having a bit of a design issue with integrating VBOs into my Mesh Class.
The problem is that I don't want to instantiate a new VBO for each mesh, and I want the VBO size to be determined by the number of meshes I load into it (not just a fixed size of 2MB or something).
Since there's no realloc function for VBOs, I need to batch load all my vertex data at once. This is ok, since I only have 4 or 5 small meshes. So I created a MeshList class.
I call MeshList.AddMesh(Mesh mesh) and it aggregates the vertex/index data of the mesh object and returns the offsets into the array of vertex data/index data back to the mesh that was added. This way the mesh knows where it is in the VBO (but not which VBO it's in).
However, none of the MeshList data is uploaded into a VBO until I call MeshList.BindToVBO(). But now, none of my meshes know which VBO they're in. So I was thinking of creating an array of pointers in MeshList that point to integer member variables in each Mesh class that would hold the VBO Handle. This way, when BindToVBO() is called, it iterates over the pointer array and updates the VBO Handles in the mesh objects.
I figured, this way it gives me the flexibility of having different mesh objects in different VBOs or all in one VBO. The only concern I have is whether or not this is a good design.
It's not clear to someone glancing at the code that MeshList.BindToVBO() is updating a whole bunch of mesh objects. I mean, MeshList does interact with all of the Mesh objects prior to the BindToVBO() call, but there's nothing explicitly saying that by passing a Mesh object to MeshList.AddMesh(), it's essentially subscribing it's VBOHandle members to updates at some point in the future.
I've tried to make this as clear as I can. Let me know if something needs clarification.
Honestly to me sounds like a lot of trouble for a dubious payoff. Do you have a reason to believe that putting multiple meshes in the same buffer is going to make a noticeable in your performance?
It sounds like premature optimization to me.
Sure, if you have a particle system with 50,000 particles I could see wanting that to be in a shared buffer, but in general I don't know if there's a benefit to storing two arbitrary meshes in the same buffer. It just sounds like a huge potential for bugs and headaches.
I'm new to OpenGL ES 2.0 with it's programmable pipeline and I'm porting application that renders many objects all with different textures.
So this will require calling glDrawArrays for each object and changing textures between calls? Or there's another way to draw multiple objects with different textures with single glDrawArrays call?
I'm asking because I noticed that doing many calls to glDrawArrays is MUCH slower when tried to use them instead of glBegin/glEnd with 'desktop' OpenGL.
I'm rendering map tiles so ALL textures are different, they are dynamically loaded (can't spend much time processing them as if they were loaded once) and also quite large (up to 512x512).
Unfortunately, there is not a simple built-in way to apply multiple textures in a single batch glDrawArrays call. There are, however, ways to make it work. One of the most common strategies is known as a Texture Atlas. Basically, the idea is to combine many images together into one larger texture, with each sub-image occupying a known sub-rectangle of the texture. When you map those onto your primitives, you supply the coordinates of the sub-rectangle corresponding to the image you want to display.
A texture atlas will work in a large number of cases, but can be comparatively complex to set up. If you don't have to do a different texture for every single object, the first thing to try would be to simply batch together as many primitives that use the same texturing as possible.
If you were not using OpenGL ES, you might also look into using Texture Arrays, if your textures are all of the same size.