I have several thousand quads to draw, some of which might fall entirely outside the viewport. I could write code which will detect which quads fall wholly outside viewport and ask OpenGL to draw only those which will be at least partially visible. Alternatively, I could simply have OpenGL draw all of the quads, regardless of whether they intersect with the viewport.
I don't have enough experience with OpenGL to know if one of these is obviously better (or if OpenGL offers some quick viewport intersection test I can use). Are draws outside the viewport close to being no-ops, or are they expensive enough that I should I try to avoid them?
It depends on your circumstances.
Drawing is best done in batches, preferably batches that are static in structure (ie: each batch is drawn in its entirety). So you shouldn't be culling down at the quad level. But doing some culling of large groups of quads is not unwelcome.
The primary performance that you'll lose is vertex transform (aka: your vertex shader). A vertex shader has to be run on every vertex you provide, regardless of anything else. However, hardware will discard triangles that are trivially outside of the viewport, so you won't soak up any fillrate or other performance.
However, that doesn't mean that it's OK if your vertex T&L is cheap. Rendering large blocks of triangles that aren't visible may very well stall the rasterizer, because all of the triangles are being culled. That is, if you draw a lot of stuff that gets culled by being off screen, the fillrate that you might have used on actually visible triangles may be lost.
So it's not a good idea to just hurl geometry at the GPU willy-nilly.
In any case, if you're doing 2D rendering, coarse culling of discrete groups of quads is really all you need. You could divide your tilemap into screen-sized portions, and you draw up to 4 of these based on the position of the camera.
Related
Is it a reasonable optimization to omit calls to ID2D1HwndRenderTarget::DrawBitmap() if the image will end up outside the visible area? If I implement the checking logic in the application that will cost some performance, so if the first thing D2D does is doing the same check then I'd rather not do it.
I had a test with my application which renders some UI part using Direct2D (and attaching renderdoc), seems it is a bit random.
I render a mix of Rectangles, Text, Path geometries (beziers) and Rectangle with a bitmap brush (which should be equivalent to your DrawBitmap call).
Then I capture a frame with all those objects visible, and another one panning my UI (using transform) so objects are not visible.
From there could check what is drawn or not:
Text is always culled
Solid color rectangles are not culled
Most of the times path geometries are culled, but sometimes not.
Rectangle with bitmap brush are NEVER culled
So it seems Direct2D is making different decisions depending on the types of elements you plan to draw.
Since rectangles are easily batched and cheap to draw, it seems that they are just drawn regardless.
Bitmap rectangles and text require more work, so it seems they are effectively culled.
Path geometry was looking to depend on how many polygon the geometry is tesselated to (I had a path that was translating to 26 primitives and it was not culled, another one translating to 120 and it got culled).
So you can either trust Direct2D that it will perform that optimization, but I would personally implement a quick rectangle to rectangle check just in case (it's not gonna hurt your performances as its an extremely simple operation).
I am creating a 3D reservoir model which looks like this.
It's made of hundreds of thousands of cells with outline. The outline is needed for all cells underneath, because there is an IJK filter used to hide cells on any level and thus show the rest. Once the model is rendered, it shouldn't need to be updated in terms of position or scale.
That's enough about the background. The approach I'm using is creating one large geometry, which stores all vertices cross the reservoir in one triangle strip. It also stores IJK index for each cell, so the IJK filter works in shader level. This should create the mesh part. Then I create another object to draw all outlines using one THREE.LineSegments.
The approach works pretty well for small amount of cells, but for large data set, frame rate drops.
I'm proposing another way of doing this by barycentric outline and instancing drawing. Barycentric outline drawing removes the extra LineSegment object, since it draws outline in fragment shader. However, it comes with drawbacks. Because of the missing of geometry shader in WebGL, I have to use full triangle rather than triangle strip to store barycentric coordinates for each vertex. I'm ok with this extra memory usage, if instanced drawing can boost the performance.(?) That's to say, I draw a cube with outline, and I create as many instances as I need and put them in right position.
I am wondering if this approach is indeed gonna increase the performance theoretically. Any thoughts are welcomed!
Ok I think I am gonna answer this question myself.
I implemented the change based on above ideas and it works pretty good compared to the original version.
Let's put the result first: this approach has no problem rendering hundreds of thousands of cells at reasonable frame rate. My demo contains 400,000 cells, with the frame rate at 50 fps in worst case, running on my Nvidia 1050Ti card and 4k monitor. For comparison, if I draw 400,000 cells in the previous version, the frame rate could drop to 10 fps.
This means using instanced drawing for a large object is faster than composing a single large geometry. For rendering performance, the instanced cube is rendered only one side, while triangle-stripped cube is two-sided. Once I can draw a single unit cube with ideal outline, I can transform it to any places in "any" shape in vertex shader. But of course instanced drawing comes with its restrictions: each cell doesn't have to be at same shape, but has to have same number of vertices, faces, etc; I lost control to change vertex color...
As for memory usage, the new approach actually use less. I provide position for 8 vertices, instead of 14, in each cell. Even though the first unit cube has 36 vertices, I can use its unit position as index, for subsequent instances. That is, for 36 unit vertices (0/1, 0/1, 0/1), I only need to provide 8 real positions.
Hope this helps for people who want to implement the same optimization.
OpenGL Question:I have something to ask about clip space transformation. I am reading an online tutorial and it says that everything you draw outside the clip space will be clipped. When it come to this, does the elements outside the clip space affects the performance or not? Because it will not be drawn and thus it doesn't affect.
Assuming that it will affect performance and in case of 2d game like super mario, I am thinking about not to draw the elements outside the clip space to achieve better performance. Please clarify. Thanks.
OpenGL has only a certain amount of knowledge about your scene and will clip very late in the pipeline. It can't apply a broad phase test. Assuming you can, you should.
Supposing you had a model with 30,000 triangles, OpenGL would transform each and every one of those 30,000 triangles before considering clipping. If you know something as simple as the bounding sphere for the model it's possible you could see that the whole thing is completely outside of the frustum in a single test and save almost 30,000 extra bits of effort.
In a 2d game like Mario what this usually means is using the scroll position to index into the map and to generate geometry only for potentially visible tiles and sprites that are within the visible area.
For the map that will generally just men figuring out the (x, y) of one corner and then generating geometry for the known width and height of the screen so it means discarding the vast majority of the geometry with zero processing.
For the sprites, this is generally why in those sort of games you often see enemies reset to their starting position if you walk a little way from them and then walk back: they're added to the active list based on a map location trigger and removed when you walk far enough away. While not active, no mutable storage is afforded to them.
Suppose I have a 3D model:
The model is given in the form of vertices, faces (all triangles) and normal vectors. The model may have holes and/or transparent parts.
For an arbitrarily placed light source at infinity, I have to determine:
[required] which triangles are (partially) shadowed by other triangles
Then, for the partially shadowed triangles:
[bonus] what fraction of the area of the triangle is shadowed
[superbonus] come up with a new mesh that describe the shape of the shadows exactly
My final application has to run on headless machines, that is, they have no GPU. Therefore, all the standard things from OpenGL, OpenCL, etc. might not be the best choice.
What is the most efficient algorithm to determine these things, considering this limitation?
Do you have single mesh or more meshes ?
Meaning if the shadow is projected on single 'ground' surface or on more like room walls or even near objects. According to this info the solutions are very different
for flat ground/wall surfaces
is usually the best way a projected render to this surface
camera direction is opposite to light normal and screen is the render to surface. Surface is not usually perpendicular to light so you need to use projection to compensate... You need 1 render pass for each target surface so it is not suitable if shadow is projected onto near mesh (just for ground/walls)
for more complicated scenes
You need to use more advanced approach. There are quite a number of them and each has its advantages and disadvantages. I would use Voxel map but if you are limited by space than some stencil/vector approach will be better. Of course all of these techniques are quite expensive and without GPU I would not even try to implement them.
This is how Voxel map looks like:
if you want just self shadowing then voxel map size can be only some boundig box around your mesh and in that case you do not incorporate whole mesh volume instead just projection of each pixel into light direction (ignore first voxel...) to avoid shadow on lighted surface
There is a great article about multiple light sources in GLSL
http://en.wikibooks.org/wiki/GLSL_Programming/GLUT/Multiple_Lights
But light0 and light1 parameters described in shader code, what if must draw flare gun shots, e.g every flare has it own position, color and must illuminate surroundings. How we manage other objects shader to deal with unknown (well there is a limit to max flares on the screen) position, colors of flares? For example there will be 8 max flares on screen, what i must to pass 8*2 uniforms, even if they not exist at this time?
Or imagine you making level editor, user can place lamps, how other objects will "know" about new light source and render then new lamp has been added?
I think there must be clever solution, but i can't find one.
Lighting equations usually rely on additive colour. So the output is the colour of light one plus the colour of light two plus the colour of light three, etc.
One of the in-framebuffer blending modes offered by OpenGL is additive blending. So the colour output of anything new that you draw will be added to whatever is already in the buffer.
The most naive solution is therefore to write your shader to do exactly one light. If you have multiple lights, draw the scene that many times, each time with a different nominated line. It's an example of multipass rendering.
Better solutions involve writing shaders to do two, four, eight or whatever lights at once, doing, say, 15 lights as an 8-light draw then a 4-light draw then a 2-light draw then a 1-light draw, and including only geometry within reach of each light when you do that pass. Which tends to mean finding intelligent ways to group lights by locality.
EDIT: with a little more thought, I should add that there's another option in deferred shading, though it's not completely useful on most GL ES devices at the moment due to the limited options for output buffers.
Suppose theoretically you could render your geometry exactly once and store whatever you wanted per pixel. So you wouldn't just output a colour, you'd output, say, a position in 3d space, a normal, a diffuse colour, a specular colour and a specular exponent. Those would then all be in a per-pixel buffer.
You could then render each light by (i) working out the maximum possible space it can occupy when projected onto the screen (so, a 2d rectangle that relates directly to pixels); and (ii) rendering the light as a single quad of that size, for each pixel reading the relevant values from the buffer you just set up and outputting an appropriately lit colour.
Then you'd do all the actual geometry in your scene only exactly once, and each additional light would cost at most a single, full-screen quad.
In practice you can't really do that because the output buffers you tend to be able to use in ES provide too little storage. But what you can usually do is render to a 32bit colour buffer with an attached depth buffer. So you can just store depth in the depth buffer and work out world (x, y, z) from that plus the [uniform] position of the camera in the light shader. You could store 8-bit versions of normal x and y in the colour buffer so as to spend 16 bits and work out z in the colour buffer because you know that the normal is always of unit length. Then, to pick a concrete example at random, maybe you could store a 16-bit version of the diffuse colour in the remaining space, possibly in YCrCb with extra storage for Y.
The main disadvantage is that hardware antialiasing then doesn't due to much the same sort of concerns as transparency and depth buffers. But if you get to the point where you save dramatically on lighting it might still make sense to do manual antialiasing by rendering a large version of the scene and then scaling it down in a final pass.