Guidelines for optimizing WebGL performance by minimizing shader/state changes - performance

I'm trying to get an idea of the practicality of WebGL for rendering large interior scenes, consisting of 100K's of triangles. These triangles are distributed over many objects, and there are many materials in the scene. On the other hand, there are no moving parts. And the materials tend to be fairly simple, mostly based on texture maps. There is a lot of texture map sharing .. for example all the chairs in scene will share a common map. There is also some multitexturing - up to three textures overlaid in a material.
I've been doing a little experimentation and reading, and gather that frequently switching materials during a rendering pass will slow things down. For example, a scene with 200K triangles will have significant performance differences, depending on whether there are 10 or 1000 objects, assuming that each time an object is displayed a new material is set up.
So it seems that if performance is important the scene should be sorted by materials so as to minimize material switching. What I'm looking for is guidelines on how to think of the overhead of various state changes, and where do I get the biggest bang for the buck. For example,
what are the relative performance costs of, say, gl.useProgram(), gl.uniformMatrix4fv(), gl.drawElements()
should I try to write ubershaders to minimize shader switching?
should I try to aggregate geometry to minimize the number of gl.drawElements() calls
I realize that mileage may vary depending on browser, OS, and graphics hardware. And I'm also not looking for heroic measures. Just some guidelines from people who have already had some experience in making scenes fast. I'll add that while I've had some experience with fixed-pipeline OpenGL programming in the past, I'm rather new to the WebGL/OpenGL ES 2.0 way of doing things.

Have you read batch, batch, batch? Admittedly, it focuses on directX, but the reasoning applies to a lesser extent to Open/WebGL also: Each API call has significant overhead on the CPU. The advice is use all the API's options to share textures, use instancing (if available), write complex shaders to avoid many draw calls. So if you can draw the whole house as a single mesh in a single call, that would be better than 1000 calls for each room. Writing ubershaders is reccomended but mostly because it may allow you to remove draw calls, not because GPU state switching is expensive.
This assumes recent hardware. For low end platforms (iPad?) or Intel GMA chips, the bottlenecks will be elsewhere (like in software vertex processing).

Related

Difference and uses of InstancedMesh and InterleavedBuffer in ThreeJS

Can anyone help we out with the difference between InstancedMesh and InterleavedBuffer in threejs. I'm kinds confused with both the topics and can anyone let me know which is the optimized way to go with to render some large amount of geometry.
Thanks in advance.
Instanced rendering and interleaved buffers a two separate things. You can use both techniques on their own or in combination.
THREE.InstancedMesh provides a convenient interface for instanced rendering. This approach is useful when you have to render a huge number of objects with the same material and geometry but with different world transformations. THREE.InstancedMesh allows you to improve the performance of your app by reducing the amount of draw calls. So instead of drawing each object with a single draw call, you can draw them all at once.
InterleavedBuffer provides the possibility to manage your vertex data in an interleaved fashion. The motivation of doing this is to improve the amount of cache hits on the GPU. If you are more interested in the theory behind this approach, I suggest you google "structure of arrays vs. array of structures". The latter one applies to InterleavedBuffer.
In general, the performance benefits of both techniques depends on the specific use case. According to my personal experiences, the benefits of interleaved buffers is hard to measure since the performance improvements depend on the respective GPU. In many cases, I've seen no difference in FPS when using interleaved buffers. However, it's much more easier to see a performance improvement if the amount of draw calls is high and you lower it by using instanced rendering.
three.js provides examples for both techniques. webgl_buffergeometry_instancing_interleaved demonstrates a combination.
three.js R114

Dividing a sphere into multiple texture

I have a sphere with texture of earth that I generate on the fly with the canvas element from an SVG file and manipulate it.
The texture size is 16384x8192 , and less than this - it's look blurry on close zoom.
But this is a huge texture size and causing memory problems... (But it's look very good when it is working)
I think a better approach would be to split the sphere into 32 separated textures, each in size of 2048x2048
A few questions:
How can I split the sphere and assign the right textures?
Is this approach better in terms of memory and performance from a single huge texture?
Is there a better solution?
Thanks
You could subdivide a cube, and cubemap this.
Instead of having one texture per face, you would have NxN textures. 32 doesn't sound like a good number, but 24 for example does, (6x2x2).
You will still use the same amount of memory. If the shape actually needs to be spherical you can further subdivide the segments and normalize the entire shape (spherify it).
You probably cant even use such a big texture anyway.
notice the top sphere (cubemap, ignore isocube):
Typically, that's not something you'd do programmatically, but in a 3D program like Blender or 3D max. It involves some trivial mesh separation, UV mapping and material assignment. One other approach that's worth experimenting with would be to have multiple materials but only one mesh - you'd still get (somewhat) progressive loading. BUT
Are you sure you'd be better off with "chunks" loading sequentially rather than one big texture taking a huge amount of time? Sure, it'll improve a bit in terms of timeouts and caching, but the tradeoff is having big chunks of your mesh be textureless, which is noticeable and unasthetic.
There are a few approaches that would mitigate your problem. First, it's important to understand that texture loading optimization techniques - while common in game engines - aren't really part of threejs or what it's built for. You'll never get the near-seamless LODs or GPU optimization techniques that you'll get with UE4 or Unity. Furthermore webGL - while having made many strides over the past decade - is not ideal for handling vast texture sizes, not at the GPU level (since it's based on OpenGL ES, suited primarily for mobile devices) and certainly not at the caching level - we're still dealing with broswers here. You won't find a lot of webGL work done with vast textures of the dimensions you refer to.
Having said that,
A. A loader will let you do other things while your textures are loading so your user isn't staring at an 'unfinished mesh'. It lets you be pretty clever with dynamic loading times and UX design. Additionally, take a look at this gist to give you an idea for what a progressive texture loader could look like. A much more involved technique, that's JPEG specific, can be found here but I wouldn't approach it unless you're comfortable with low-level graphics programming.
B. Threejs does have a basic implementation of LOD although I haven't tinkered with it myself and am not sure it's useful for textures; that said, the basic premise to inquire into is whether you can load progressively higher-resolution files on a per-need basis, just like Google Earth does it for example.
C. This is out of the scope of your question - but I'd look into what happens under the hood in Unity's webgl export (which is based on threejs), and what kind of clever tricks are being employed there for similar purposes.
Finally, does your project have to be in webgl? For something ambitious and demanding, sometimes "proper" openGL / DX makes much more sense.

Will an Octree improve performance if my scene contains less than 200K vertices?

I am finally making the move to OpenGL ES 2.0 and am taking advantage of a VBO to load all of the scene data onto the graphics cards memory. However my scene is only around the 200,000 vertices in size ( and I know it depends on hardware somewhat ) but does anyone think an octree would make any sense in this instance ? ( incidentally because of the view point at least 60% of the scene is visible most of the time ) Clearly I am trying to avoid having to implementing an Octree at such an early stage of my GLSL coding life !
There is no need to be worried about optimization and performance if the App you are coding is for learning purpose only. But given your question, apparently you intend to make a commercial App.
Only using VBO will not solve problems on performance for you App, specially as you mentioned that you meant it to run on mobile devices. OpenGL ES has an optimized option for drawing called GL_TRIANGLE_STRIP, which is worth particularly for complex geometry.
Also interesting to add up in improving performance is to apply Bump Masking, for the case you have textures in your model. With these two approaches you App will be remarkably improved.
As you mention that your entire scenery is visible all the time, you should also use level of detail (LOD). To implement geometry LOD, you need a different mesh for each LOD that you wish to use, and each level has fewer polygons than the closest one. You can make yourself the geometry for each LOD, or you can also apply some 3D software to make it automatically.
Some tools are free and you can access and use it to automatically perform generic optimization directly on your GLSL ES code, and it is really worth checking.

OpenGL - Will using multiple VBO's slow down rendering?

I am rendering some meshes (sometimes upwards of 500) and I wanted to know the best way to approach this. Would it be pointless to create 500 VBOs and then if they pass the frustum and visibility tests, render them. Is there a more efficient way to do this? I am looking to maximize performance.
To answer your question, yes, many VBOs will slow things down. More polys will usually slow down the render, but more draw calls has a much greater hit. You want to minimize state changes and draws, as well as the number of buffers you have (and memory use).
I would suggest first looking at the buffers and figuring out how many you need. If you can batch/instance geometry, merge static geometry into a single buffer, reuse buffer more efficiently, etc.
Once you've cut the buffers down to the minimum possible, you'll want to use culling of multiple sorts. Visibility, both by frustrum (perhaps in an octree) and occlusion, can provide a significant performance boost. The main idea is to disqualify the geometry as fast and simply as possible, so you start with rough tests (octree), then somewhat more detailed (perhaps an AABB and/or simplified hull), then occlusion, then actually draw.
Here's a good article on frustrum culling, which touches a bit on quadtrees (and by extension, octrees). Diagrams, explanations and some sample code.
OpenGL occlusion culling articles seem a bit less common, although this one from GPU Gems might be a good starting place.

How to prevent overdrawing?

This is a difficult question to search in Google since it has other meaning in finance.
Of course, what I mean here is "Drawing" as in .. computer graphics.. not money..
I am interested in preventing overdrawing for both 3D Drawing and 2D Drawing.
(should I make them into two different questions?)
I realize that this might be a very broad question since I didn't specify which technology to use. If it is too broad, maybe some hints on some resources I can read up will be okay.
EDIT:
What I mean by overdrawing is:
when you draw too many objects, rendering single frame will be very slow
when you draw more area than what you need, rendering a single frame will be very slow
It's quite complex topic.
First thing to consider is frustum culling. It will filter out objects that are not in camera’s field of view so you can just pass them on render stage.
The second thing is Z-sorting of objects that are in camera. It is better to render them from front to back so that near objects will write “near-value” to the depth buffer and far objects’ pixels will not be drawn since they will not pass depth test. This will save your GPU’s fill rate and pixel-shader work. Note however, if you have semitransparent objects in scene, they should be drawn first in back-to-front order to make alpha-blending possible.
Both things achievable if you use some kind of space partition such as Octree or Quadtree. Which is better depends on your game. Quadtree is better for big open spaces and Octree is better for in-door spaces with many levels.
And don't forget about simple back-face culling that can be enabled with single line in DirectX and OpenGL to prevent drawing of faces that are look at camera with theirs back-side.
Question is really too broad :o) Check out these "pointers" and ask more specifically.
Typical overdraw inhibitors are:
Z-buffer
Occlusion based techniques (various buffer techniques, HW occlusions, ...)
Stencil test
on little bit higher logic level:
culling (usually by view frustum)
scene organization techniques (usually trees or tiling)
rough drawing front to back (this is obviously supporting technique :o)
EDIT: added stencil test, has indeed interesting overdraw prevention uses especially in combination of 2d/3d.
Reduce the number of objects you consider for drawing based on distance, and on position (ie. reject those outside of the viewing frustrum).
Also consider using some sort of object-based occlusion system to allow large objects to obscure small ones. However this may not be worth it unless you have a lot of large objects with fairly regular shapes. You can pre-process potentially visible sets for static objects in some cases.
Your API will typically reject polygons that are not facing the viewpoint also, since you typically don't want to draw the rear-face.
When it comes to actual rendering time, it's often helpful to render opaque objects from front-to-back, so that the depth-buffer tests end up rejecting entire polygons. This works for 2D too, if you have depth-buffering turned on.
Remember that this is a performance optimisation problem. Most applications will not have a significant problem with overdraw. Use tools like Pix or NVIDIA PerfHUD to measure your problem before you spend resources on fixing it.

Resources