What are the non-trivial use-cases of attributes in WebGL/OpenGL in general? - opengl-es

I'm making a WebGL game and eventually came up with a pretty convenient concept of object templates, when the game objects of the same kind (say, characters of the same race) are using the same template (which means: buffers, attributes and shader program), and are instanced from that template by specifying a set of uniforms (which are, in fact, the most common difference between the same-kind objects: model matrix, textures, bones positions, etc). For making independent objects with their own deep-copy of buffers, I just deep-copy and re-initialize the original template and start instantiating new objects from it.
But after that I started having doubts. Say, if I start using morphing on objects, by explicit editing of the vertices, this approach will require me to make a separate template for every object of such kind (otherwise, they would start morphing in exactly the same phase). Which is probably fine for this very case, 'cause I'll most likely need to recalculate normals and even texture coordinates, which means – most of the buffers.
But what if I'm missing some very common case of using attributes, say, blood decals, which will require me to update only a small piece of the buffer? In that case, it would be much more reasonable to have two buffers for each object: a common one that is shared by them all and the one for blood decals, which is unique for every single of them. And, as blood is usually spilled on everything, this sounds pretty reasonable, so that we would save a lot of space by storing vertices, normals and such without their unnecessary duplication.
I haven't tried implementing decals yet, so honestly not even sure if implementing them using vertex painting (textured or not) is the right choice. But I'm also pretty sure there are some commonly used attributes aside from vertices, normals and texture coordinates.
Here are some that I managed to come up with myself:
decals (probably better to be modelled as separate objects?)
bullet holes and such (same as decals maybe?)
Any thoughts?
UPD: as all this might sound confusing, I want to clarify: I do understand that using as few buffers as possible is a good thing, this is exactly why I'm trying to use this templates concept. My question is: what are the possible cases when using a single buffer and a single element buffer (with both of them shared between similar objects) for a template is going to stab me in the back?

Keeping a giant chunk of data that won't change on the card is incredibly useful for saving bandwidth. Additionally, you probably won't be directly changing the vertices positions once they are on the card. Instead you will probably morph them with passed in uniforms in the Vertex shader through Skeletal animation. Read about it here: Skeletal Animation
Do keep in mind though, that in Key frame animation with meshes, you would keep a bunch of buffers on the card each in a different key frame pose of the animation. However, you would then load whatever two key frames you want to interpolate over in as attributes and then blend between them (You can have more than two). Keyframe Animation
Additionally, with the introduction of Transformation Feedback, (No you don't get to use it in WebGL, it became core in OpenGL 3.0, WebGL is based on OpenGL ES 2.0, which is based on OpenGL 2.0) you can start keeping calculated data GPU side. In other words, you can do a giant particle system simulation in the vertex or geometry shader and then store the calculated data into another buffer, then use that buffer in the next frame without having to have a round trip from the GPU to CPU Read about them here: Transform Feedback and here: Transform Feedback how to
In general, you don't want to touch buffers once they are on the card, especially every frame. Instead load several and use pointers to that data in shaders as attributes.

Related

Is there a way to create simple animations "on the fly" in modern OpenGL?

I think this requires a bit of background information:
I have been modding Minecraft for a while now, but I alway wanted to make my own game, so I started digging into the freshly released LWJGL3 to actually get things done. Yes, I know it's a bit ow level and I should use an engine and stuff...indeed, I already tried some engines and they never quite match what I want to do, so I decided I want to tackle the problem at its root.
So far, I kind of understand how to render meshes, move the "camera", etc. and I'm willing to take the learning curve.
But the thing is, at some point all the tutorials start to explain how to load models and create skeletal animations and so on...but I think I do not really want to go that way. A lot of things in working with Minecraft code was awful, but I liked how I could create models and animations from Java code. Sure, it did not look super realistic, but since I'm not great with Blender either, I doubt having "classic" models and animations would help. Anyway, in that code, I could rotate a box around to make a creature look at a player, I could use a sinus function to move legs and arms (or wings, in my case) and that was working, since Minecraft used immediate mode and Java could directly tell the graphics card where to draw each vertex.
So, actual question(s): Is there any good way to make dynamic animations in modern (3.3+) OpenGL? My models would basically be a hierarchy of shapes (boxes or whatever) and I want to be able to rotate them on the fly. But I'm not sure how to organize that. Would I store all the translation/rotation-matrices for each sub-shape? Would that put a hard limit on the amount of sub-shapes a model could have? Did anyone try something like that?
Edit: For clarification, what I did looked something like this:
Create a model: https://github.com/TheOnlySilverClaw/Birdmod/blob/master/src/main/java/silverclaw/birds/client/model/ModelOstrich.java
The model is created as a bunch of boxes in the constructor, the render and setRotationAngles methods set scale and rotations.
You should follow one opengl tutorial in order to understand the basics.
Let me suggest "Learning Modern 3D Graphics Programming", and especially this chapter, where you move one robot arm with multiple joints.
I did a port in java using jogl here, but you can easily port it over lwjgl.
What you are looking for is exactly skeletal animation, the only difference being the fact you do not want to load animations for your bones but want to compute / generate transforms on the fly.
You basically have a hierarchy of bones, and geometry attached to it. It looks like you want to manipulate this geometry "rigidly", so before sending your meshes / transforms to the GPU (the classic way), you want to start by computing the new transforms in model or world space, then send those freshly computed matrices to draw your geometries on the gpu the standard way.
As Sorin said, to compute each transform you simply have to iterate over your hierarchy and accumulate transforms given the transform of the parent bone and your local transform w.r.t the parent.
Yes and no.
You can have your hierarchy of shapes and store a relative transform for each.
For example the "player" whould have a translation to 100,100, 10 (where the player is), and then the "head" subcomponent would have an additional translation of 0,0,5 (just a bit higher on the z axis).
You can store these as matrices (they can encode translation, roation and scaling) and use glPushMatrix and glPop matrix to add and remove a matrix to a stack maintained by openGL.
The draw() function(or whatever you call it) should look something like :
glPushMatrix();
glMultMatrix(my_transform); // You can also just have glTranslate, glRotate or anything else.
// Draw my mesh
for (child : children) { child.draw(); }
glPopMatrix();
This gives you a hierarchical setup so that objects move with their parent. Alternatively you can have a stack in the main memory and do the multiplications yourself (use a library). I think the openGL stack may have a limit (implementation dependent), but if you handle it yourself the only limit is the amount of ram you can use. Once all the matrices are multiplied rendering is done in the same amount of time, that is it doesn't matter for performance how deep a mesh is in the hierarchy.
For actual animations you need to compute the intermediate transformations. For example for a crouch animation you probably want to have a few frames in between so that the camera doesn't just jump to the low position. You can do this with a time based linear interpolation between the start and end positions, but this only covers simple animations and you still have to implement it yourself.
Anything more complicated (i.e. modify the mesh based on the bone links) you would need to implement yourself.

Efficiently rendering a transparent terrain in OpenGL

I'm writing an OpenGL program that visualizes caves, so when I visualize the surface terrain I'd like to make it transparent, so you can see the caves below. I'm assuming I can normalize the data from a Digital Elevation Model into a grid aligned to the X/Z axes with regular spacing, and render each grid cell as two triangles. With an aligned grid I could avoid the cost of sorting when applying the painter's algorithm (to ensure proper transparency effects); instead I could just render the cells row by row, starting with the farthest row and the farthest cell of each row.
That's all well and good, but my question for OpenGL experts is, how could I draw the terrain most efficiently (and in a way that could scale to high resolution terrains) using OpenGL? There must be a better way than calling glDrawElements() once for every grid cell. Here are some ways I'm thinking about doing it (they involve features I haven't tried yet, that's why I'm asking the experts):
glMultiDrawElements Idea
Put all the terrain coordinates in a vertex buffer
Put all the coordinate indices in an element buffer
To draw, write the starting indices of each cell into an array in the desired order and call glMultiDrawElements with that array.
This seems pretty good, but I was wondering if there was any way I could avoid transferring an array of indices to the graphics card every frame, so I came up with the following idea:
Uniform Buffer Idea
This seems like a backward way of using OpenGL, but just putting it out there...
Put the terrain coordinates in a 2D array in a uniform buffer
Put coordinate index offsets 0..5 in a vertex buffer (they would have to be floats, I know)
call glDrawArraysInstanced - each instance will be one grid cell
the vertex shader examines the position of the camera relative to the terrain and determines how to order the cells, mapping gl_instanceId to the index of the first coordinate of the cell in the Uniform Buffer, and setting gl_Position to the coordinate at this index + the index offset attribute
I figure there might be shiny new OpenGL 4.0 features I'm not aware of that would be more elegant than either of these approaches. I'd appreciate any tips!
The glMultiDrawElements() approach sounds very reasonable. I would implement that first, and use it as a baseline you can compare to if you try more complex approaches.
If you have a chance to make it faster will depend on whether the processing of draw calls is an important bottleneck in your rendering. Unless the triangles you render are very small, and/or your fragment shader very simple, there's a good chance that you will be limited by fragment processing anyway. If you have profiling tools that allow you to collect data and identify bottlenecks, you can be much more targeted in your optimization efforts. Of course there is always the low-tech approach: If making the window smaller improves your performance, chances are that you're mostly fragment limited.
Back to your question: Since you asked about shiny new GL4 features, another method you could check out is indirect rendering, using glDrawElementsIndirect(). Beyond being more flexible, the main difference to glMultiDrawElements() is that the parameters used for each draw, like the start index in your case, can be sourced from a buffer. This might prevent one copy if you map this buffer, and write the start indices directly to the buffer. You could even combine it with persistent buffer mapping (look up GL_MAP_PERSISTENT_BIT) so that you don't have to map and unmap the buffer each time.
Your uniform buffer idea sounds pretty interesting. I'm slightly skeptical that it will perform better, but that's just a feeling, and not based on any data or direct experience. So I think you absolutely should try it, and report back on how well it works!
Stretching the scope of your question some more, you could also look into approaches for order-independent transparency rendering if you haven't considered and rejected them already. For example alpha-to-coverage is very easy to implement, and almost free if you would be using MSAA anyway. It doesn't produce very high quality transparency effects based on my limited attempts, but it could be very attractive if it does the job for your use case. Another technique for order-independent transparency is depth peeling.
If some self promotion is acceptable, I wrote an overview of some transparency rendering methods in an earlier answer here: OpenGL ES2 Alpha test problems.

OpenGL Optimization - Duplicate Vertex Stream or Call glDrawElements Repeatedly?

This is for an OpenGL ES 2.0 game on Android, though I suspect the right answer is generic to any opengl situation.
TL;DR - is it better to send N data to the gpu once and then make K draw calls with it; or send K*N data to the gpu once, and make 1 draw call?
More Details I'm wondering about best practices for my situation. I have a dynamic mesh whose vertices I recompute every frame - think of it as a water surface - and I need to project these vertices onto K different quads in my game. (In each case the projection is slightly different; sparing details, you could imagine them as K different mirrors surrounding the mesh.) K is in the order of 10-25; I'm still figuring it out.
I can think of two broad options:
Bind the mesh as is, and call draw K different times, either
changing a uniform for shaders or messing with the fixed function
state to render to the correct quad in place (on the screen) or to different
segments of a texture (which I can later use when rendering the quads to achieve
the same effect).
Duplicate all the vertices in the mesh K times, essentially making a
single vertex stream with K meshes in it, and add an attribute (or
few) indicating which quad each mesh clone is supposed to project
onto (and how to get there), and use vertex shaders to project. I
would make one call to draw, but send K times as much data.
The Question: of those two options, which is generally better performance wise?
(Additionally: is there a better way to do this?
I had considered a third option, where I rendered the mesh details to a texture, and created my K-clone geometry as a sort of dummy stream, which I could bind once and for all, that looked up in a vertex shader into the texture for each vertex to find out what vertex it really represented; but I've been told that texture support in vertex shaders is poor or prohibited in OpenGL ES 2.0 and would prefer to avoid that route.)
There is no perfect answer to this question, though I would suggest you think about the nature of real-time computer graphics and the OpenGL pipeline. Although "the GL" is required to produce results that are consistent with in-order execution, the reality is that GPUs are highly parallel beasts. They employ lots of tricks that work best if you actually have many unrelated tasks going on at the same time (some even split the whole pipeline up into discrete tiles). GDDR memory, for instance is really high latency, so for efficiency GPUs need to be able to schedule other jobs to keep the stream processors (shader units) busy while memory is fetched for a job that is just starting.
If you are recomputing parts of your mesh each frame, then you will almost certainly want to favor more draw calls over massive CPU->GPU data transfers every frame. Saturating the bus with unnecessary data transfers plagues even PCI Express hardware (it is far slower than the overhead that several additional draw calls would ever add), it can only get worse on embedded OpenGL ES systems. Having said that, there is no reason you could not simply do glBufferSubData (...) to stream in only the affected portions of your mesh and continue to draw the entire mesh in a single draw call.
You might get better cache coherency if you split (or partition the data within) the buffer and/or draw calls up, depending on your actual use case scenario. The only way to decisively tell which is going to work better in your case is to profile your software on your target hardware. But all of this fail to look at the bigger picture, which is: "Why am I doing this on the CPU?!"
It sounds like what you really want is simply vertex instancing. If you can re-work your algorithm to work completely in vertex shaders by passing instance IDs you should see a massive improvement over all of the solutions I have seen you propose so far (true instancing is actually somewhere between what you described in solutions 1 and 2) :)
The actual concept of instancing is very simple and will give you benefits whether your particular version of the OpenGL API supports it at the API level or not (you can always implement it manually with vertex attributes and extra vertex buffer data). The thing is, you would not have to duplicate your data at all if you implement instancing correctly. The extra data necessary to identify each individual vertex is static, and you can always change a shader uniform and make an additional draw call (this is probably what you will have to do with OpenGL ES 2.0, since it does not offer glDrawElementsInstanced) without touching any vertex data.
You certainly will not have to duplicate your vertices K*N times, your buffer space complexity would be more like O (K + K*M), where M is the number of new components you had to add to uniquely identify each vertex so that you could calculate everything on the GPU. For "instance," you might need to number each of the vertices in your quad 1-4 and process the vertex differently in your shader depending on which vertex you're processing. In this case, the M coefficient is 1 and it does not change no matter how many instances of your quad you need to dynamically calculate each frame; N would determine the number of draw calls in OpenGL ES 2.0, not the size of your data. None of this additional storage space would be necessary in OpenGL ES 2.0 if it supported gl_VertexID :(
Instancing is the best way to make effective use of the highly-parallel GPU and avoid CPU/GPU synchronization and slow bus transfers. Even though OpenGL ES 2.0 does not support instancing in the API sense, multiple draw calls using the same vertex buffer where the only thing you change between calls are a couple of shader uniforms is often preferable to computing your vertices on the CPU and uploading new vertex data every frame or having your vertex buffer's size depend directly on the number of instances you intend to draw (yuck). You'll have to try it out and see what your hardware likes.
Instancing would be what you are looking for but unfortunately it is not available with OpenGL ES 2.0. I would be in favor of sending all the vertices to the GPU and make one draw call if all your assets can fit into the GPU. I have an experience of reducing draw calls from 100+ to 1 and the performance went from 15 fps to 60 fps.

OpenGL drawing the same polygon many times in multiple places

In my opengl app, I am drawing the same polygon approximately 50k times but at different points on the screen. In my current approach, I do the following:
Draw the polygon once into a display list
for each instance of the polygon, push the matrix, translate to that point, scale and rotate appropriate (the scaling of each point will be the same, the translation and rotation will not).
However, with 50k polygons, this is 50k push and pops and computations of the correct matrix translations to move to the correct point.
A coworker of mine also suggested drawing the entire scene into a buffer and then just drawing the whole buffer with a single translation. The tradeoff here is that we need to keep all of the polygon vertices in memory rather than just the display list, but we wouldn't need to do a push/translate/scale/rotate/pop for each vertex.
The first approach is the one we currently have implemented, and I would prefer to see if we can improve that since it would require major changes to do it the second way (however, if the second way is much faster, we can always do the rewrite).
Are all of these push/pops necessary? Is there a faster way to do this? And should I be concerned that this many push/pops will degrade performance?
It depends on your ultimate goal. More recent OpenGL specs enable features for "geometry instancing". You can load all the matrices into a buffer and then draw all 50k with a single "draw instances" call (OpenGL 3+). If you are looking for a temporary fix, at the very least, load the polygon into a Vertex Buffer Object. Display Lists are very old and deprecated.
Are these 50k polygons going to move independently? You'll have to put up with some form of "pushing/popping" (even though modern scene graphs do not necessarily use an explicit matrix stack). If the 50k polygons are static, you could pre-compile the entire scene into one VBO. That would make it render very fast.
If you can assume a recent version of OpenGL (>=3.1, IIRC) you might want to look at glDrawArraysInstanced and/or glDrawElementsInstanced. For older versions, you can probably use glDrawArraysInstancedEXT/`glDrawElementsInstancedEXT, but they're extensions, so you'll have to access them as such.
Either way, the general idea is fairly simple: you have one mesh, and multiple transforms specifying where to draw the mesh, then you step through and draw the mesh with the different transforms. Note, however, that this doesn't necessarily give a major improvement -- it depends on the implementation (even more than most things do).

Efficient way of drawing in OpenGL ES

In my application I draw a lot of cubes through OpenGL ES Api. All the cubes are of same dimensions, only they are located at different coordinates in space. I can think of two ways of drawing them, but I am not sure which is the most efficient one. I am no OpenGL expert, so I decided to ask here.
Method 1, which is what I use now: Since all the cubes are of identical dimensions, I calculate vertex buffer, index buffer, normal buffer and color buffer only once. During a refresh of the scene, I go over all cubes, do bufferData() for same set of buffers and then draw the triangle mesh of the cube using drawElements() call. Since each cube is at different position, I translate the mvMatrix before I draw. bufferData() and drawElements() is executed for each cube. In this method, I probably save a lot of memory, by not calculating the buffers every time. But I am making lot of drawElements() calls.
Method 2 would be: Treat all cubes as set of polygons spread all over the scene. Calculate vertex, index, color, normal buffers for each polygon (actually triangles within the polygons) and push them to graphics card memory in single call to bufferData(). Then draw them with single call to drawElements(). The advantage of this approach is, I do only one bindBuffer and drawElements call. The downside is, I use lot of memory to create the buffers.
My experience with OpenGL is limited enough, to not know which one of the above methods is better from performance point of view.
I am using this in a WebGL app, but it's a generic OpenGL ES question.
I implemented method 2 and it wins by a landslide. The supposed downside of high amount of memory seemed to be only my imagination. In fact the garbage collector got invoked in method 2 only once, while it was invoked for 4-5 times in method 1.
Your OpenGL scenario might be different from mine, but if you reached here in search of performance tips, the lesson from this question is: Identify the parts in your scene that don't change frequently. No matter how big they are, put them in single buffer set (VBOs) and upload to graphics memory minimum number of times. That's how VBOs are meant to be used. The memory bandwidth between client (i.e. your app) and graphics card is precious and you don't want to consume it often without reason.
Read the section "Vertex Buffer Objects" in Ch. 6 of "OpenGL ES 2.0 Programming Guide" to understand how they are supposed to be used. http://opengles-book.com/
I know that this question is already answered, but I think it's worth pointing out the Google IO presentation about WebGL optimization:
http://www.youtube.com/watch?v=rfQ8rKGTVlg
They cover, essentially, this exact same issue (lot's of identical shapes with different colors/positions) and talk about some great ways to optimize such a scene (and theirs is dynamic too!)
I propose following approach:
On load:
Generate coordinates buffer (for one cube) and load it into VBO (gl.glGenBuffers, gl.glBindBuffer)
On draw:
Bind buffer (gl.glBindBuffer)
Draw each cell (loop)
2.1. Move current position to center of current cube (gl.glTranslatef(position.x, position.y, position.z)
2.2. Draw current cube (gl.glDrawArrays)
2.3. Move position back (gl.glTranslatef(-position.x, -position.y, -position.z))

Resources