Dealing with lack of glDrawElementsBaseVertex in OpenGL ES

Dealing with lack of glDrawElementsBaseVertex in OpenGL ES - opengl-es

I'm working on porting a Direct3D terrain renderer to Android and just learned that OpenGL did not have an equivalent to the BaseVertexIndex parameter of DrawIndexedPrimitive until version 3.2 introduced the glDrawElementsBaseVertex method. That method is not available in OpenGL ES.
The D3D terrain renderer uses a single, large vertex buffer to hold the active terrain patches in an LRU fashion. The same 16-bit indices are used to draw each patch.
Given the lack of a base vertex index offset in OpenGL ES, I can't use the same indices to draw each patch. Furthermore, the buffer is too large for 16-bit absolute indices. The alternatives I've identified are:
Use one VBO or vertex array per patch.
Use 32-bit indices and generate new indices for every block in the VBO.
Stop using indexing and replicate vertices as needed. Note that most vertices appear in six triangles. Switching to triangle strips could help, but still doubles the number of vertices.
None of these seem very efficient compared to what was possible in D3D. Are there any other alternatives?

You didn't specify the exact data layout of your VBOs, but if your base vertex offset is not negative you can apply an offset when binding the VBO to the vertex attribute (glVertexAttribPointer).

Related

How to sum triangle pixels with OpenGL ES

I am new to OpenGL ES. I am currently reading docs about 2.0 version of OpenGL ES. I have a triangular 2D mesh, a 2D RGB texture and i need to compute, for every triangle, the following quantities:
where N is the number of pixels of a given triangle. This quantities are needed for further CPU processing. The idea would be to use GPU rasterization to sum quantities over triangles. I am not able to see how to do this with OpenGL ES 2.0 (which is the most popular version among android devices). Another question i have is: is it possible to do this type of computation with OpenGL ES 3.0?

I am not able to see how to do this with OpenGL ES 2.0
You can't; the API simply isn't designed to do it.
Is it possible to do this type of computation with OpenGL ES 3.0?
In the general case, no. If you can use OpenGL ES 3.1 and if you can control the input geometry then a viable algorithm would be:
Add a vertex attribute which is the primitive ID for each triangle in the mesh (we can use as an array index).
Allocate an atomics buffer GL_ATOMIC_COUNTER_BUFFER with one atomic per primitive, which is pre-zeroed.
In the fragment shader increment the atomic corresponding the current primitive (loaded from the vertex attribute).
Performance is likely to be pretty horrible though - atomics generally suck for most GPU implementations.

In opengl ES can I use a vertex buffer array buffer etc for shader shared matrices?

As OpenGL ES does not support shared "uniform blocks" I was wondering if there is a way I can put matrices that can be referenced by a number of different shaders, a simple example would be a worldToViewport or worldToEye which would not change for an entire frame and which all shaders would reference. I saw one post where one uses 3 or 4 dot calls on a vertex to transform it from 4 "column vectors", but wondering if there is a way to assign the buffer data to a "mat4" in the shader.
Ah yes the need for this is webGL which at the moment it seems only to support openGLES 2.0.
I wonder if it supports indexed attribute buffers as I assume they don't need to be any specified size relative to the size of the position vertex array.
Then if one can use a hard coded or calculated index into the attribute buffer ( in the shader ) and if one can bind more than one attribute buffer at a time, and access all "bound to the shader" buffers simultaneously in a shader ...
I see if all true might work. I need a good language/architecture reference on shaders as I am somewhat new to shader programming as I I'm trying to deign a wall without knowing the shapes of the bricks :)

Vertex attributes are per-vertex, so there is no way so share vertex attributes amongst multiple vertices.
OpenGL ES 2.0 upwards has CPU-side uniforms, which must be uploaded individually from the CPU at draw time. Uniforms belong to the program object, so for uniforms which are constant for a frame you only have to modify each program once, so the cost isn't necessarily proportional to draw count.
OpenGL ES 3.0 onwards has Uniform Buffer Objects (UBOs) which allow you to load uniforms from a buffer in memory.
I'm not sure what you mean by "doesn't support shared uniform blocks", as that's pretty much what a UBO is, albeit it won't work on older hardware which only supports OpenGL ES 2.x.

OpenGL Optimization - Duplicate Vertex Stream or Call glDrawElements Repeatedly?

This is for an OpenGL ES 2.0 game on Android, though I suspect the right answer is generic to any opengl situation.
TL;DR - is it better to send N data to the gpu once and then make K draw calls with it; or send K*N data to the gpu once, and make 1 draw call?
More Details I'm wondering about best practices for my situation. I have a dynamic mesh whose vertices I recompute every frame - think of it as a water surface - and I need to project these vertices onto K different quads in my game. (In each case the projection is slightly different; sparing details, you could imagine them as K different mirrors surrounding the mesh.) K is in the order of 10-25; I'm still figuring it out.
I can think of two broad options:
Bind the mesh as is, and call draw K different times, either
changing a uniform for shaders or messing with the fixed function
state to render to the correct quad in place (on the screen) or to different
segments of a texture (which I can later use when rendering the quads to achieve
the same effect).
Duplicate all the vertices in the mesh K times, essentially making a
single vertex stream with K meshes in it, and add an attribute (or
few) indicating which quad each mesh clone is supposed to project
onto (and how to get there), and use vertex shaders to project. I
would make one call to draw, but send K times as much data.
The Question: of those two options, which is generally better performance wise?
(Additionally: is there a better way to do this?
I had considered a third option, where I rendered the mesh details to a texture, and created my K-clone geometry as a sort of dummy stream, which I could bind once and for all, that looked up in a vertex shader into the texture for each vertex to find out what vertex it really represented; but I've been told that texture support in vertex shaders is poor or prohibited in OpenGL ES 2.0 and would prefer to avoid that route.)

There is no perfect answer to this question, though I would suggest you think about the nature of real-time computer graphics and the OpenGL pipeline. Although "the GL" is required to produce results that are consistent with in-order execution, the reality is that GPUs are highly parallel beasts. They employ lots of tricks that work best if you actually have many unrelated tasks going on at the same time (some even split the whole pipeline up into discrete tiles). GDDR memory, for instance is really high latency, so for efficiency GPUs need to be able to schedule other jobs to keep the stream processors (shader units) busy while memory is fetched for a job that is just starting.
If you are recomputing parts of your mesh each frame, then you will almost certainly want to favor more draw calls over massive CPU->GPU data transfers every frame. Saturating the bus with unnecessary data transfers plagues even PCI Express hardware (it is far slower than the overhead that several additional draw calls would ever add), it can only get worse on embedded OpenGL ES systems. Having said that, there is no reason you could not simply do glBufferSubData (...) to stream in only the affected portions of your mesh and continue to draw the entire mesh in a single draw call.
You might get better cache coherency if you split (or partition the data within) the buffer and/or draw calls up, depending on your actual use case scenario. The only way to decisively tell which is going to work better in your case is to profile your software on your target hardware. But all of this fail to look at the bigger picture, which is: "Why am I doing this on the CPU?!"
It sounds like what you really want is simply vertex instancing. If you can re-work your algorithm to work completely in vertex shaders by passing instance IDs you should see a massive improvement over all of the solutions I have seen you propose so far (true instancing is actually somewhere between what you described in solutions 1 and 2) :)
The actual concept of instancing is very simple and will give you benefits whether your particular version of the OpenGL API supports it at the API level or not (you can always implement it manually with vertex attributes and extra vertex buffer data). The thing is, you would not have to duplicate your data at all if you implement instancing correctly. The extra data necessary to identify each individual vertex is static, and you can always change a shader uniform and make an additional draw call (this is probably what you will have to do with OpenGL ES 2.0, since it does not offer glDrawElementsInstanced) without touching any vertex data.
You certainly will not have to duplicate your vertices K*N times, your buffer space complexity would be more like O (K + K*M), where M is the number of new components you had to add to uniquely identify each vertex so that you could calculate everything on the GPU. For "instance," you might need to number each of the vertices in your quad 1-4 and process the vertex differently in your shader depending on which vertex you're processing. In this case, the M coefficient is 1 and it does not change no matter how many instances of your quad you need to dynamically calculate each frame; N would determine the number of draw calls in OpenGL ES 2.0, not the size of your data. None of this additional storage space would be necessary in OpenGL ES 2.0 if it supported gl_VertexID :(
Instancing is the best way to make effective use of the highly-parallel GPU and avoid CPU/GPU synchronization and slow bus transfers. Even though OpenGL ES 2.0 does not support instancing in the API sense, multiple draw calls using the same vertex buffer where the only thing you change between calls are a couple of shader uniforms is often preferable to computing your vertices on the CPU and uploading new vertex data every frame or having your vertex buffer's size depend directly on the number of instances you intend to draw (yuck). You'll have to try it out and see what your hardware likes.

Instancing would be what you are looking for but unfortunately it is not available with OpenGL ES 2.0. I would be in favor of sending all the vertices to the GPU and make one draw call if all your assets can fit into the GPU. I have an experience of reducing draw calls from 100+ to 1 and the performance went from 15 fps to 60 fps.

Implementing sparse matrix construction and multiplication in OpenGL ES

I have googled around but havnt found an answer that suits me for OpenGL.
I want to construct a sparse matrix with a single diagonal and around 9 off-diagonals. These diagonals arent necessarily next to the main diagonal and they wrap around. Each diagonal is an image in row-major format i.e. a vector of size NxM.
The size of the matrix is (NxM)x(NxM)
My question is as follows:
After some messing around with the math I have arrived at the basic units of my operation. It involves a pixel by pixel multiplication of two images (WITHOUT limiting the value of the result i.e. so it can be above 1 or below 0), storing the resulting image and then adding a bunch of the resulting images (SAME as above).
How can I multiply and add images on a pixel by pixel basis in OpenGL? Is it easier in 1.1 or 2.0? Will use of textures cause hard maxing of the results to between 0 and 1? Will this maximize the use of the gpu cores?

In order to be able to store values outside the [0-1] range you would have to use floating point textures. There is no support in OpenGL ES 1.1 and for OpenGL ES 2.0 it is an optional extension (See other SO question).
In case your implementation supports it you could then write a fragment program to do the required math.
In OpenGL ES 1.1 you could use the glTexEnv call to set up how the pixels from different texture units are supposed to be combined. You could then use "modulate" or "add" to multiply/add the values. The result would be clamped to [0,1] range though.

Efficient way of drawing in OpenGL ES

In my application I draw a lot of cubes through OpenGL ES Api. All the cubes are of same dimensions, only they are located at different coordinates in space. I can think of two ways of drawing them, but I am not sure which is the most efficient one. I am no OpenGL expert, so I decided to ask here.
Method 1, which is what I use now: Since all the cubes are of identical dimensions, I calculate vertex buffer, index buffer, normal buffer and color buffer only once. During a refresh of the scene, I go over all cubes, do bufferData() for same set of buffers and then draw the triangle mesh of the cube using drawElements() call. Since each cube is at different position, I translate the mvMatrix before I draw. bufferData() and drawElements() is executed for each cube. In this method, I probably save a lot of memory, by not calculating the buffers every time. But I am making lot of drawElements() calls.
Method 2 would be: Treat all cubes as set of polygons spread all over the scene. Calculate vertex, index, color, normal buffers for each polygon (actually triangles within the polygons) and push them to graphics card memory in single call to bufferData(). Then draw them with single call to drawElements(). The advantage of this approach is, I do only one bindBuffer and drawElements call. The downside is, I use lot of memory to create the buffers.
My experience with OpenGL is limited enough, to not know which one of the above methods is better from performance point of view.
I am using this in a WebGL app, but it's a generic OpenGL ES question.

I implemented method 2 and it wins by a landslide. The supposed downside of high amount of memory seemed to be only my imagination. In fact the garbage collector got invoked in method 2 only once, while it was invoked for 4-5 times in method 1.
Your OpenGL scenario might be different from mine, but if you reached here in search of performance tips, the lesson from this question is: Identify the parts in your scene that don't change frequently. No matter how big they are, put them in single buffer set (VBOs) and upload to graphics memory minimum number of times. That's how VBOs are meant to be used. The memory bandwidth between client (i.e. your app) and graphics card is precious and you don't want to consume it often without reason.
Read the section "Vertex Buffer Objects" in Ch. 6 of "OpenGL ES 2.0 Programming Guide" to understand how they are supposed to be used. http://opengles-book.com/

I know that this question is already answered, but I think it's worth pointing out the Google IO presentation about WebGL optimization:
http://www.youtube.com/watch?v=rfQ8rKGTVlg
They cover, essentially, this exact same issue (lot's of identical shapes with different colors/positions) and talk about some great ways to optimize such a scene (and theirs is dynamic too!)

I propose following approach:
On load:
Generate coordinates buffer (for one cube) and load it into VBO (gl.glGenBuffers, gl.glBindBuffer)
On draw:
Bind buffer (gl.glBindBuffer)
Draw each cell (loop)
2.1. Move current position to center of current cube (gl.glTranslatef(position.x, position.y, position.z)
2.2. Draw current cube (gl.glDrawArrays)
2.3. Move position back (gl.glTranslatef(-position.x, -position.y, -position.z))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio