OpenGL tile rendering: most efficient way? - performance

I am creating a tile-based 2D game as a way of learning basic "modern" OpenGL concepts. I'm using shaders with OpenGL 2.1., and am familiar with the rendering pipeline and how to actually draw geometry on-screen. What I'm wondering is the best way to organize a tilemap to render quickly and efficiently. I have thought of several potential methods:
1.) Store the quad representing a single tile (vertices and texture coordinates) in a VBO and render each tile with a separate draw* call, translating it to the correct position onscreen and using uniform2i to give the location in the texture atlas for that particular tile;
2.) Keep a VBO containing every tile onscreen (already-computed screen coordinates and texture atlas coordinates), using BufferSubData to update the tiles every frame but using a single draw* call;
3.) Keep VBOs containing static NxN "chunks" of tiles, drawing however many chunks of tiles are at least partially visible onscreen and translating them each into position.
*I'd like to stay away from the last option if possible unless rendering chunks of 64x64 is not too inefficient. Tiles are loaded into memory in blocks of that size, and even though only about 20x40 tiles are visible onscreen at a time, I would have to render up to four chunks at once. This method would also complicate my code in several other ways.
So, which of these is the most efficient way to render a screen of tiles? Are there any better methods?

You could do any one of these and they would probably be fine; what you're proposing to render is very, very simple.
#1 will definitely be worse in principle than the other options, because you would be drawing many extremely simple “models” rather than letting the GPU do a whole lot of batch work on one draw call. However, if you have only 20×40 = 800 tiles visible on screen at once, then this is a trivial amount of work for any modern CPU and GPU (unless you're doing some crazy fragment shader).
I recommend you go with whichever is simplest to program for you, so that you can continue work on your game. I imagine this would be #1, or possibly #2. If and when you find yourself with a performance problem, do whichever of #2 or #3 (64×64 sounds like a fine chunk size) lets you spend the least CPU time on your program's part of drawing (i.e. updating the buffer(s)).

I've been recently learning modern OpenGL myself, through OpenGL ES 2.0 on Android. The OpenGL ES 2.0 Programming Guide recommends an "array of structures", that is,
"Store vertex attributes together in a single buffer. The structure represents all attributes of a vertex and we have an array of these attributes per vertex."
While this may seem like it would initially consume a lot of space, it allows for efficient rendering using VBOs and flexibility in texture mapping each tile. I recently did a tiled hex grid using interleaved arrays containing vertex, normals, color, and texture data for a 20x20 tile hex grid on a Droid 2. So far things are running smoothly.


Efficiently rendering a transparent terrain in OpenGL

I'm writing an OpenGL program that visualizes caves, so when I visualize the surface terrain I'd like to make it transparent, so you can see the caves below. I'm assuming I can normalize the data from a Digital Elevation Model into a grid aligned to the X/Z axes with regular spacing, and render each grid cell as two triangles. With an aligned grid I could avoid the cost of sorting when applying the painter's algorithm (to ensure proper transparency effects); instead I could just render the cells row by row, starting with the farthest row and the farthest cell of each row.
That's all well and good, but my question for OpenGL experts is, how could I draw the terrain most efficiently (and in a way that could scale to high resolution terrains) using OpenGL? There must be a better way than calling glDrawElements() once for every grid cell. Here are some ways I'm thinking about doing it (they involve features I haven't tried yet, that's why I'm asking the experts):
glMultiDrawElements Idea
Put all the terrain coordinates in a vertex buffer
Put all the coordinate indices in an element buffer
To draw, write the starting indices of each cell into an array in the desired order and call glMultiDrawElements with that array.
This seems pretty good, but I was wondering if there was any way I could avoid transferring an array of indices to the graphics card every frame, so I came up with the following idea:
Uniform Buffer Idea
This seems like a backward way of using OpenGL, but just putting it out there...
Put the terrain coordinates in a 2D array in a uniform buffer
Put coordinate index offsets 0..5 in a vertex buffer (they would have to be floats, I know)
call glDrawArraysInstanced - each instance will be one grid cell
the vertex shader examines the position of the camera relative to the terrain and determines how to order the cells, mapping gl_instanceId to the index of the first coordinate of the cell in the Uniform Buffer, and setting gl_Position to the coordinate at this index + the index offset attribute
I figure there might be shiny new OpenGL 4.0 features I'm not aware of that would be more elegant than either of these approaches. I'd appreciate any tips!
The glMultiDrawElements() approach sounds very reasonable. I would implement that first, and use it as a baseline you can compare to if you try more complex approaches.
If you have a chance to make it faster will depend on whether the processing of draw calls is an important bottleneck in your rendering. Unless the triangles you render are very small, and/or your fragment shader very simple, there's a good chance that you will be limited by fragment processing anyway. If you have profiling tools that allow you to collect data and identify bottlenecks, you can be much more targeted in your optimization efforts. Of course there is always the low-tech approach: If making the window smaller improves your performance, chances are that you're mostly fragment limited.
Back to your question: Since you asked about shiny new GL4 features, another method you could check out is indirect rendering, using glDrawElementsIndirect(). Beyond being more flexible, the main difference to glMultiDrawElements() is that the parameters used for each draw, like the start index in your case, can be sourced from a buffer. This might prevent one copy if you map this buffer, and write the start indices directly to the buffer. You could even combine it with persistent buffer mapping (look up GL_MAP_PERSISTENT_BIT) so that you don't have to map and unmap the buffer each time.
Your uniform buffer idea sounds pretty interesting. I'm slightly skeptical that it will perform better, but that's just a feeling, and not based on any data or direct experience. So I think you absolutely should try it, and report back on how well it works!
Stretching the scope of your question some more, you could also look into approaches for order-independent transparency rendering if you haven't considered and rejected them already. For example alpha-to-coverage is very easy to implement, and almost free if you would be using MSAA anyway. It doesn't produce very high quality transparency effects based on my limited attempts, but it could be very attractive if it does the job for your use case. Another technique for order-independent transparency is depth peeling.
If some self promotion is acceptable, I wrote an overview of some transparency rendering methods in an earlier answer here: OpenGL ES2 Alpha test problems.

SDL accelerated rendering

I am trying to understand the whole 2D accelerated rendering process using SDL 2.0.
So my question is which would be the most efficient way to draw circles in the screen and why?
Some ways would be:
First to create a software surface and then draw the necessary pixels on that surface then create a texture out of that surface and lastly copy that texture to the rendering target.
Also another implementation would be to draw a circle using multiple times SDL_RenderDrawLine.And I think this is the way it is being implemented in SDL 2.0 gfx
Or there is a more efficient way to do all of this?
Take this question more generally in means of if I would wanted to draw other shapes manually, which probably, couldn't be rendered easily with the 2D rendering API that SDL provides(using draw line or rectangle).
With the example of circles this is a fairly complicated question, it is more based on the visual quality you wish to achieve which will drive performance. Drawing lots of short lines will vary vastly based on how close to a circle you wish to get, if you are happy to use say, 60 lines, which will work on small shapes nearly seamlessly but if scaled up will begin to appear not to be a circle, the performance will likely be better (depending on the user's hardware). Note also SDL_RenderDrawLines will be much much faster for many lines as it avoids lots of context switches for rendering calls.
However if you need a very accurate circle with thousands of lines to get a good approximation it will be faster to simply use a bitmap and scale and blit it. This will also give you a 'smoother' feel to the circle.
In my personal opinion I do not think the hardware accelerated render API has much use outside of some special uses such as graph rendering and perhaps very simple GUI drawing. For anything more complex I would usually use bitmap based drawing.
With regards to the second part, it again depends on the accuracy of any arcs you need to draw. If you can easily approximate the shape into a few tens of lines it will be fast, otherwise the pixel method is better.

OpenGL drawing the same polygon many times in multiple places

In my opengl app, I am drawing the same polygon approximately 50k times but at different points on the screen. In my current approach, I do the following:
Draw the polygon once into a display list
for each instance of the polygon, push the matrix, translate to that point, scale and rotate appropriate (the scaling of each point will be the same, the translation and rotation will not).
However, with 50k polygons, this is 50k push and pops and computations of the correct matrix translations to move to the correct point.
A coworker of mine also suggested drawing the entire scene into a buffer and then just drawing the whole buffer with a single translation. The tradeoff here is that we need to keep all of the polygon vertices in memory rather than just the display list, but we wouldn't need to do a push/translate/scale/rotate/pop for each vertex.
The first approach is the one we currently have implemented, and I would prefer to see if we can improve that since it would require major changes to do it the second way (however, if the second way is much faster, we can always do the rewrite).
Are all of these push/pops necessary? Is there a faster way to do this? And should I be concerned that this many push/pops will degrade performance?
It depends on your ultimate goal. More recent OpenGL specs enable features for "geometry instancing". You can load all the matrices into a buffer and then draw all 50k with a single "draw instances" call (OpenGL 3+). If you are looking for a temporary fix, at the very least, load the polygon into a Vertex Buffer Object. Display Lists are very old and deprecated.
Are these 50k polygons going to move independently? You'll have to put up with some form of "pushing/popping" (even though modern scene graphs do not necessarily use an explicit matrix stack). If the 50k polygons are static, you could pre-compile the entire scene into one VBO. That would make it render very fast.
If you can assume a recent version of OpenGL (>=3.1, IIRC) you might want to look at glDrawArraysInstanced and/or glDrawElementsInstanced. For older versions, you can probably use glDrawArraysInstancedEXT/`glDrawElementsInstancedEXT, but they're extensions, so you'll have to access them as such.
Either way, the general idea is fairly simple: you have one mesh, and multiple transforms specifying where to draw the mesh, then you step through and draw the mesh with the different transforms. Note, however, that this doesn't necessarily give a major improvement -- it depends on the implementation (even more than most things do).

DirectX9 - Efficiently Drawing Sprites

I'm trying to create a platformer game, and I am taking various sprite blocks, and piecing them together in order to draw the level. This requires drawing a large number of sprites on the screen every single frame. A good computer has no problem handling drawing all the sprites, but it starts to impact performance on older computers. Since this is NOT a big game, I want it to be able to run on almost any computer. Right now, I am using the following DirectX function to draw my sprites:
D3DXVECTOR3 center(0.0f, 0.0f, 0.0f);
D3DXVECTOR3 position(static_cast<float>(x), static_cast<float>(y), z);
(my LPD3DXSPRITE object)->Draw((sprite texture pointer), NULL, &center, &position, D3DCOLOR_ARGB(a, r, g, b));
Is there a more efficient way to draw these pictures on the screen? Is there a way that I can use less complex picture files (I'm using regular png's right now) to speed things up?
To sum it up: What is the most performance friendly way to draw sprites in DirectX? thanks!
The ID3DXSPRITE interface you are using is already pretty efficient. Make sure all your sprite draw calls happen in one batch if possible between the sprite begin and end calls. This allows the sprite interface to arrange the draws in the most efficient way.
For extra performance you can load multiple smaller textures in to one larger texture and use texture coordinates to get them out. This makes it so textures don't have to be swapped as frequently. See:
The file type you are using for the textures does not matter as long as they are are preloaded into textures. Make sure you load them all in to textures once when the game/level is loading. Once you have loaded them in to textures it does not matter what format they were originally in.
If you still are not getting the performance you want, try using PIX to profile your application and find where the bottlenecks really are.
This is too long to fit in a comment, so I will edit this post.
When I say swapping textures I mean binding them to a texture stage with SetTexture. Each time SetTexture is called there is a small performance hit as it changes the state of the texture stage. Normally this delay is fairly small, but can be bad if DirectX has to pull the texture from system memory to video memory.
ID3DXsprite will reorder the draws that are between begin and end calls for you. This means SetTexture will typically only be called once for each texture regardless of the order you draw them in.
It is often worth loading small textures into a large one. For example if it were possible to fit all small textures in to one large one, then the texture stage could just stay bound to that texture for all draws. Normally this will give a noticeable improvement, but testing is the only way to know for sure how much it will help. It would look terrible, but you could just throw in any large texture and pretend it is the combined one to test what performance difference there would be.
I agree with dschaeffer, but would like to add that if you are using a large number different textures, it may better to smush them together on a single (or few) larger textures and adjust the texture coordinates for different sprites accordingly. Texturing state changes cost a lot and this may speed things up on older systems.

Efficient way of drawing in OpenGL ES

In my application I draw a lot of cubes through OpenGL ES Api. All the cubes are of same dimensions, only they are located at different coordinates in space. I can think of two ways of drawing them, but I am not sure which is the most efficient one. I am no OpenGL expert, so I decided to ask here.
Method 1, which is what I use now: Since all the cubes are of identical dimensions, I calculate vertex buffer, index buffer, normal buffer and color buffer only once. During a refresh of the scene, I go over all cubes, do bufferData() for same set of buffers and then draw the triangle mesh of the cube using drawElements() call. Since each cube is at different position, I translate the mvMatrix before I draw. bufferData() and drawElements() is executed for each cube. In this method, I probably save a lot of memory, by not calculating the buffers every time. But I am making lot of drawElements() calls.
Method 2 would be: Treat all cubes as set of polygons spread all over the scene. Calculate vertex, index, color, normal buffers for each polygon (actually triangles within the polygons) and push them to graphics card memory in single call to bufferData(). Then draw them with single call to drawElements(). The advantage of this approach is, I do only one bindBuffer and drawElements call. The downside is, I use lot of memory to create the buffers.
My experience with OpenGL is limited enough, to not know which one of the above methods is better from performance point of view.
I am using this in a WebGL app, but it's a generic OpenGL ES question.
I implemented method 2 and it wins by a landslide. The supposed downside of high amount of memory seemed to be only my imagination. In fact the garbage collector got invoked in method 2 only once, while it was invoked for 4-5 times in method 1.
Your OpenGL scenario might be different from mine, but if you reached here in search of performance tips, the lesson from this question is: Identify the parts in your scene that don't change frequently. No matter how big they are, put them in single buffer set (VBOs) and upload to graphics memory minimum number of times. That's how VBOs are meant to be used. The memory bandwidth between client (i.e. your app) and graphics card is precious and you don't want to consume it often without reason.
Read the section "Vertex Buffer Objects" in Ch. 6 of "OpenGL ES 2.0 Programming Guide" to understand how they are supposed to be used.
I know that this question is already answered, but I think it's worth pointing out the Google IO presentation about WebGL optimization:
They cover, essentially, this exact same issue (lot's of identical shapes with different colors/positions) and talk about some great ways to optimize such a scene (and theirs is dynamic too!)
I propose following approach:
On load:
Generate coordinates buffer (for one cube) and load it into VBO (gl.glGenBuffers, gl.glBindBuffer)
On draw:
Bind buffer (gl.glBindBuffer)
Draw each cell (loop)
2.1. Move current position to center of current cube (gl.glTranslatef(position.x, position.y, position.z)
2.2. Draw current cube (gl.glDrawArrays)
2.3. Move position back (gl.glTranslatef(-position.x, -position.y, -position.z))
