I want to implement the "depth peeling" in webgl but the problem is that there is no occlusion query so I don't know how to check when the "peeling" of the scene is over.
Do you see an other way to do that?
The usual approach is to limit the peeling to a certain amount of steps. This is sometimes even better than using occlusion queries because to many layers of transparent structure become close to impossible to discern from each other. It often helps to know what you are exactly rendering to get a good estimate of the amount of layers you need to peel.
I recently implemented depth peeling in webgl. There are a few limiting factors that make it kinda hard to do as many peels as layers. Mainly a very limited amount of texture units and the fact you can only render to one target at a time, so you have to render color and depth seperately. With 7 textures used I can do 4 peels. That already takes 11 render passes per frame. To do more peels you would need to do a bit more sophisticated merging of intermediate results. I doubt you gain much from more peels.
Related
I'm currently working on a game using Three.js. I've been studying software engineering for four years and have been working professionally on backends for two, but I've barely touched on graphics aside from some simple Unity experimenting.
I currently have ~22,000 vertices and ~8,000 faces according to renderstats.js, and my desktop (above average) can't run it above 20 FPS. I'm using Lambert material as well as a single ambient light, so I feel like this isn't too much to ask.
With these figures in mind, is this the expected behavior for three.js rendering?
I would be pretty sure that is not end of the line and you are probably missing some possibilities for massive performance-improvements.
But just to give you some numbers first,
if you leave everything fancy away (including three.js) and just render an ultra-simple point-cloud with one fragment rendered per point, you can easily get to rendering 10-20 million (yes, million) points/vertices on an average GPU.
just with simple shapes and material, I already got three.js to render something in the range of 500k triangles (at 1080p-resolution) at 60FPS without problem. You can probably take those numbers times 10 for latest high-end GPUs.
However, these kinds of numbers are not really helpful.
Some hints:
if you want to debug your rendering-performance, you should first add some metrics. Renderstats is good, but I'd recommend integrating http://spite.github.io/rstats/ for this (see the example).
generally the choice of material shouldn't matter too much, the GPU is way more capable than most people think. It's more likely a problem somewhere else in the pipeline. EDIT from comment: In some cases, like hi-resolution displays with slow GPUs (think mobile-devices) this might be less true and complicated shader-code can slow down your site, but it might worth be looking at the other points first. As the rendering itself happens off-thread (so you can't measure it's duration using regular tools like the devtools-profiler), you can use the EXT_disjoint_timer_query-extension to get some information about what is going on on the GPU.
the number of drawcalls shouldn't be too high: three.js needs to do a single drawcall for every Mesh and Points-object rendered in the scene and too many objects are generally a far bigger problem than objects with lots of vertices. Reducing the number of drawcalls can be done by merging multiple geometries into one and making use of multi-materials, vertex-colors and things like that.
if you are doing postprocessing, the GPU needs to render every pixel on screen several times. This might as well massively limit your performance. This can be optimized by merging multiple postprocessing-passes into one (I admit, that'd be a lot of hard work..)
another problem could be on the JS side: you should use the profiler or timeline-view from the chrome devtools to see if maybe it's the javascript that is taking too much time per frame (shouldn't be more than 8-12ms per frame). I've been told there are ways to optimize the javascript-performance as well :)
I have a retro-looking 2D game with a lot of sprites (reminiscent of Sega's Super Scaler arcades) which do not use semi-transparency. I have thought about using the Z-Buffer over sorting to simplify things. Ok, but by default writes are done to the Z-buffer even though alpha is zero, giving the effect illustrated here:
http://i.stack.imgur.com/ubLlp.png
Now, since I'm in OpenGL ES 2, I don't have alpha testing, so from what I understand my only possibility is to discard the pixel from the fragment shader if alpha is 0 so that it doesn't get written to the Z-Buffer. But in terms of performance this is SO wrong: not only the if is slow, but the discard basically kills the purpose since it disables early depth testing and the result is way worse than doing it in software.
if (val.a < 0.5) {
discard;
}
Is there any other solution I could use which would not kill the performance? Do all 2D games sort sprites themselves and not use depth buffer?
It's a tradeoff really. If you let the z-buffer do the sorting and use discard in your shaders then it's more expensive on the GPU because of branching and late depth testing as you say.
If you do the depth sorting yourself, then you'll find it's harder to issue your draw calls in an optimal order (e.g. you'll keep having to change texture). Draw calls on GLES2 have a very significant CPU hit on lower end devices and the count will probably go up.
If performance is a big concern, then probably the second option is better if you do it in conjunction with a big effort on the texture atlasing front to minimize your draw call count, this might be particularly effective if your sprites are low resolution retro sprites because you'll be able to get a lot of sprites per texture atlas. It isn't a clear winner by any stretch and I can imagine that different games take different approaches.
Also, you should take into account that the vast majority of target hardware is going to perform just fine whichever path you choose, and maybe you should just choose the one that is faster to implement and makes your code simpler (which is probably letting the z-buffer do the sorting).
If you fancy a technical challenge, I've often thought the best approach might be divide up your sprites into fully opaque sections and sections with transparency and render the two parts as separate meshes (they won't be quads any more). You'd have to do a lot of preprocessing and draw a lot more triangles, but by being able to do some rendering with fully-opaque parts then you can take advantage of the hidden-surface-removal tech in all iOS devices and lots of Android devices. Certainly by doing this you should be able to reduce your fill rate burden, but at a cost of increased draw calls, and there might be an unnecessarily high amount of added complexity to your code and your tools.
I'm implementing a simple lightning effect for my 3D game, something like this:
http://www.krazydad.com/bestiary/bestiary_lightning.html
I'm using opengl ES 2.0. I'm pondering what the best looking and most performance efficient way to render this in a 3D environment is though, as the lines making up the electric bolt needs to be looking "solid" when viewed from any angle.
I was thinking to generate two planes for each line segment, in an X cross to create an effect of line thickness. Rendering by disabling depth buffer writes, using some kind off additive blending mode. Texturing each line segment using an electric looking texture with an alpha channel.
I'm a bit worried about the performance hit from generating the necessary triangle lists using this method though, as my game will potentially have a lot of lightning bolts generated at the same time. But as the length and thickness of the lightning bolts will vary a lot, I doubt it would look good to simply use an animated 3D object of an lightning bolt, stretched and pointing to the right location, which was my initial idea.
I was thinking of an alternative approach where I render the lightning bolts using 2D lines between projected end points in a post processing pass. That should work well since the perspective effect in my case is negligible, except then it would be tricky to have the lines appear behind occluding objects.
Any good ideas on the best approach here?
Edit: I found this white paper from nVidia:
http://developer.download.nvidia.com/SDK/10/direct3d/Source/Lightning/doc/lightning_doc.pdf
Which uses an approach with having billboards for each line segment, then apply some filtering to smooth the resulting gaps and overlaps from each billboard.
Seems to yield pretty good visual results, however I am not too happy about the additional filtering pass as the game is for mobile phones where such a step is quite costly. And, as it turns out, billboarding is quite CPU expensive too, due to the additional matrix calculation overhead, which is slow on mobile devices.
I ended up doing something like the nVidia paper suggested, but to prevent the need for a postprocessing step I used different kind of textures for different kind of branching angles, to avoid gaps and overlaps of the segment corners, which turned out quite well. And to avoid the expensive billboard matrix calculation I instead drew the line segments using a more 2D approach, but calculating the depth value manually for each vertex in the segments. This yields both acceptable performance and visuals.
An animated texture, possibly powered by a shader, is likely the fastest way to handle this.
Any geometry generation and rendering will limit the quality of the effect, and may take significantly more CPU time, memory bandwidth and draw calls.
Using a single animated texture on a quad, or a shader creating procedural lightning, will give constant speed and make the effect much simpler to implement. For that, this question may be of interest.
I am rendering some meshes (sometimes upwards of 500) and I wanted to know the best way to approach this. Would it be pointless to create 500 VBOs and then if they pass the frustum and visibility tests, render them. Is there a more efficient way to do this? I am looking to maximize performance.
To answer your question, yes, many VBOs will slow things down. More polys will usually slow down the render, but more draw calls has a much greater hit. You want to minimize state changes and draws, as well as the number of buffers you have (and memory use).
I would suggest first looking at the buffers and figuring out how many you need. If you can batch/instance geometry, merge static geometry into a single buffer, reuse buffer more efficiently, etc.
Once you've cut the buffers down to the minimum possible, you'll want to use culling of multiple sorts. Visibility, both by frustrum (perhaps in an octree) and occlusion, can provide a significant performance boost. The main idea is to disqualify the geometry as fast and simply as possible, so you start with rough tests (octree), then somewhat more detailed (perhaps an AABB and/or simplified hull), then occlusion, then actually draw.
Here's a good article on frustrum culling, which touches a bit on quadtrees (and by extension, octrees). Diagrams, explanations and some sample code.
OpenGL occlusion culling articles seem a bit less common, although this one from GPU Gems might be a good starting place.
This is a difficult question to search in Google since it has other meaning in finance.
Of course, what I mean here is "Drawing" as in .. computer graphics.. not money..
I am interested in preventing overdrawing for both 3D Drawing and 2D Drawing.
(should I make them into two different questions?)
I realize that this might be a very broad question since I didn't specify which technology to use. If it is too broad, maybe some hints on some resources I can read up will be okay.
EDIT:
What I mean by overdrawing is:
when you draw too many objects, rendering single frame will be very slow
when you draw more area than what you need, rendering a single frame will be very slow
It's quite complex topic.
First thing to consider is frustum culling. It will filter out objects that are not in camera’s field of view so you can just pass them on render stage.
The second thing is Z-sorting of objects that are in camera. It is better to render them from front to back so that near objects will write “near-value” to the depth buffer and far objects’ pixels will not be drawn since they will not pass depth test. This will save your GPU’s fill rate and pixel-shader work. Note however, if you have semitransparent objects in scene, they should be drawn first in back-to-front order to make alpha-blending possible.
Both things achievable if you use some kind of space partition such as Octree or Quadtree. Which is better depends on your game. Quadtree is better for big open spaces and Octree is better for in-door spaces with many levels.
And don't forget about simple back-face culling that can be enabled with single line in DirectX and OpenGL to prevent drawing of faces that are look at camera with theirs back-side.
Question is really too broad :o) Check out these "pointers" and ask more specifically.
Typical overdraw inhibitors are:
Z-buffer
Occlusion based techniques (various buffer techniques, HW occlusions, ...)
Stencil test
on little bit higher logic level:
culling (usually by view frustum)
scene organization techniques (usually trees or tiling)
rough drawing front to back (this is obviously supporting technique :o)
EDIT: added stencil test, has indeed interesting overdraw prevention uses especially in combination of 2d/3d.
Reduce the number of objects you consider for drawing based on distance, and on position (ie. reject those outside of the viewing frustrum).
Also consider using some sort of object-based occlusion system to allow large objects to obscure small ones. However this may not be worth it unless you have a lot of large objects with fairly regular shapes. You can pre-process potentially visible sets for static objects in some cases.
Your API will typically reject polygons that are not facing the viewpoint also, since you typically don't want to draw the rear-face.
When it comes to actual rendering time, it's often helpful to render opaque objects from front-to-back, so that the depth-buffer tests end up rejecting entire polygons. This works for 2D too, if you have depth-buffering turned on.
Remember that this is a performance optimisation problem. Most applications will not have a significant problem with overdraw. Use tools like Pix or NVIDIA PerfHUD to measure your problem before you spend resources on fixing it.