Cocos2d 2.0 and OpenGL analyizer suggestions - xcode

I have analyzed my game running OpenGL Analyzer on XCode. I am using Cococs2d 2.0 as static library in my game and wonder whether any of the following suggestions will improve my performance. I have read some post in other forums saying that I should not worry about this but as I do have some performance issues I would like to understand if those suggestion will be likely to improve them.
Suggestions:
Overview:
Thinking:
In particular I refer to the suggestion where it says:
"reccomended using VAO and VBO"
Then I wonder also why there are "Many small batch draw calls". I am using a spritebatch node and this should avoid this issue.
Also the other suggestions seems to make sense, but those are the most "frequent" ones so would like to start analyzing those.

A "small batch draw call" is anything with fewer than n-many vertices. I am not sure the exact threshold used, but it is probably on the order of 100-200. What spritebatches really do is eliminate the need to split your draw calls up multiple times in order to switch bound textures, this does not automatically imply that each draw call is going to have more than 100 (or whatever n is defined as in this context) vertices; it is a strong possibility, but not necessary.
I would be far more concerned about non-VBO draw calls and not using VAOs to be honest, especially if you want your code to be forward-compatible.
The "Logical Buffer Load" and "Mipmapping Usage" warnings are very likely related; probably both having to do with FBOs. One of them is related to not using glClear (...) properly and the other is related to using a texture that does not have mipmaps.
Regarding logical buffer loads, you should look into GL_EXT_discard_framebuffer, clearing the framebuffer this way is a really healthy optimization strategy for Tile-Based Deferred Rendering GPUs (such as the ones used by all iOS devices).
As for the mipmap usage warning, I believe this is being triggered because you are drawing into an FBO texture and then applying that texture using a mipmap filter mode. The mip-chain/pyramid for drawn FBOs has to be built manually using glGenerateMipamp (...).
If you can point me to some individual lines that trigger these warnings, I would be happy to explain them in further detail for you.

Related

pow2 textures in FBO

Non power of two textures are very slow in OpenGL ES 2.0.
But in every "render-to-texture" tutorial I saw, people just take screen size (which is never pow2), and just make texture from it.
Should I render to pow2 texture (with projection matrix correction), or there is some kind of magic with FBO?
I don't buy into the "non power of two textures are very slow" premise in your question. First of all, these kinds of performance characteristics can be highly hardware dependent. So saying that this is true for ES 2.0 in general does not really make sense.
I also doubt that any GPU architectures developed within the last 5 to 10 years would be significantly slower when rendering to NPOT textures. If there's data that shows otherwise, I would be very interested in seeing it.
Unless you have conclusive data that shows POT textures to be faster for your target platform, I would simply use the natural size for your render targets.
If you're really convinced that you want to use POT textures, you can use glViewport() to render to part of them, as #MaticOblak also points out in a comment.
There's one slight caveat to the above: ES 2.0 has some limitations on how NPOT textures can be used. According to the standard, they do not support mipmapping, and not all wrap modes. The GL_OES_texture_npot extension, which is supported on many devices, gets rid of these limitations.

OpenGL (ES): Can an implementation optimize fragments resulting from overdraw?

I wanted to come up with a crude way to "benchmark" the performance improvement of a tweak I made to a fragment shader (to be specific, I wanted to test the performance impact of the removal of the computation of the gamma for the resulting color using pow in the fragment shader).
So I figured that if a frame was taking 1ms to render an opaque cube model using my shader that if I set glDisable(GL_DEPTH_TEST) and loop my render call 100 times, that the frame would take 100ms to render.
I was wrong. Rendering it 100 times only results in about a 10x slowdown. Obviously if depth test is still enabled, most if not all of the fragments in the second and subsequent draw calls would not be computed because they would all fail the depth test.
However I must still be experiencing a lot of fragment culls even with depth test off.
My question is about whether my hardware (in this particular situation it is an iPad3 on iOS6.1 that I am experiencing this on -- a PowerVR SGX543MP4) is just being incredibly smart and is actually able to use the geometry of later draw calls to occlude and discard fragments from the earlier geometry. If this is not what's happening, then I cannot explain the better-than-expected performance that I am seeing. The question applies to all flavors of OpenGL and desktop GPUs as well, though.
Edit: I think an easy way to "get around" this optimization might be glEnable(GL_BLEND) or something of that sort. I will try this and report back.
PowerVR hardware is based on tile-based deferred rendering. It does not begin drawing fragments until after it receives all of the geometry information for a tile on screen. This is a more advanced hidden-surface removal technique than z-buffering, and what you have actually discovered here is that enabling alpha blending breaks the hardware's ability to exploit this.
Alpha blending is very order-dependent, and so no longer can rasterization and shading be deferred to the point where only the top-most geometry in a tile has to be drawn. Without alpha blending, since there is no data dependency on the order things are drawn in, completely obscured geometry can be skipped before expensive per-fragment operations occur. It is only when you start blending fragments that a true order-dependent situation arises and completely destroys the hardware's ability to defer/cull fragment processing for hidden surfaces.
In all honesty, if you are trying to optimize for a platform based on PowerVR hardware you should probably make this one of your goals. By that, I mean, before optimizing shaders first consider whether you are drawing things in an order and/or with states that hurt the PowerVR hardware's ability to do TBDR. As you have just discovered, blending is considerably more expensive on PowerVR hardware than other hardware... the operation itself is no more complicated, it just prevents PVR hardware from working the special way it was designed to.
I can confirm that only after adding both lines:
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);
did the frame render time increase in a linear fashion in response to the repeated draw calls. Now back to my crude benchmarking.

Will an Octree improve performance if my scene contains less than 200K vertices?

I am finally making the move to OpenGL ES 2.0 and am taking advantage of a VBO to load all of the scene data onto the graphics cards memory. However my scene is only around the 200,000 vertices in size ( and I know it depends on hardware somewhat ) but does anyone think an octree would make any sense in this instance ? ( incidentally because of the view point at least 60% of the scene is visible most of the time ) Clearly I am trying to avoid having to implementing an Octree at such an early stage of my GLSL coding life !
There is no need to be worried about optimization and performance if the App you are coding is for learning purpose only. But given your question, apparently you intend to make a commercial App.
Only using VBO will not solve problems on performance for you App, specially as you mentioned that you meant it to run on mobile devices. OpenGL ES has an optimized option for drawing called GL_TRIANGLE_STRIP, which is worth particularly for complex geometry.
Also interesting to add up in improving performance is to apply Bump Masking, for the case you have textures in your model. With these two approaches you App will be remarkably improved.
As you mention that your entire scenery is visible all the time, you should also use level of detail (LOD). To implement geometry LOD, you need a different mesh for each LOD that you wish to use, and each level has fewer polygons than the closest one. You can make yourself the geometry for each LOD, or you can also apply some 3D software to make it automatically.
Some tools are free and you can access and use it to automatically perform generic optimization directly on your GLSL ES code, and it is really worth checking.

j2me: what is effect of 'g.clipRect()' on speed?

I'm developing in j2me and using canvas to drawing some images.
Now, my question is : what is difference of below sample codes in speed of drawing?
drawing after clipping area rectangle:
g.clipRect(x, y, myImage.getWidth(), myImage.getHeight());
g.drawImage(myImage, x , y, Graphics.TOP | Graphics.LEFT);
g.setClip(0, 0, screenWidth, screenHeight);
drawing without clip:
g.drawImage(myImage, x, y, Graphics.TOP | Graphics.LEFT);
is the first one is faster? I'm drawing on screen a lot.
Well the direct answer to your question would be Mu I'm afraid - because you appear to approach the issue from the wrong direction.
Thing is, clipping API is not intended for performance considerations / optimizations. You can find full coverage of its purpose in API documentation (available online), it does not state anything related to performance impact:
Clipping
The clip is the set of pixels in the destination of the Graphics object that may be modified by graphics rendering operations.
There is a single clip per Graphics object. The only pixels modified by graphics operations are those that lie within the clip. Pixels outside the clip are not modified by any graphics operations.
Operations are provided for intersecting the current clip with a given rectangle and for setting the current clip outright...
Attempting to use clipping API for imaginary performance considerations will make your code a nightmare to understand for future maintainers. Note this future maintainer may be you yourself, just few weeks / months / years later - I for one had my nose broken on my own code written some time ago without clearly understandable intent - trust me, it hurts the same as messing with poor code written by anyone else.
Don't get me wrong - there is a chance that clipping may have substantial performance impact in some particular case on specific device - why not, everything is possible given the variety of MIDP implementations. Know what? there is even a chance of it having an opposite impact on some other device, why not.
If (if) that happens, if (if) you'll somehow get a clear, solid, tested and proven justification of specific performance impact - then (then), go ahead, implement whatever tricks necessary to reach required performance, no matter how perverse they may be (BTDTGTTS). Until then, though, drop any baseless assumptions that just may come to your mind.
Until then... Just. Drop. It.
Developers love to optimize code and with good reason. It is so satisfying and fun. But knowing when to optimize is far more important. Unfortunately, developers generally have horrible intuition about where the performance problems in an application will actually be... Most performance tuning reminds me of the old joke about the guy who's looking for his keys in the kitchen even though he lost them in the street, because the light's better in the kitchen... (Brian Goetz)
This will almost certainly vary between platforms, and will depend on how much you're actually drawing.
I suggest you measure performance yourself by logging the number of paints per second, or the average duration of a paint method, and painting this on screen.
Drawing without clip should be faster on any platform for the simple reason that you are not calling two clip methods. But I might ask, why are you using clip to begin with?
You usually use clipping when you have an animation sprite or an icon variation in the same file. In this case you can create a file for each frame/icon. It will increase your jar file size and will use more heap space to hold these images on memory, but will be drawn faster.

How to prevent overdrawing?

This is a difficult question to search in Google since it has other meaning in finance.
Of course, what I mean here is "Drawing" as in .. computer graphics.. not money..
I am interested in preventing overdrawing for both 3D Drawing and 2D Drawing.
(should I make them into two different questions?)
I realize that this might be a very broad question since I didn't specify which technology to use. If it is too broad, maybe some hints on some resources I can read up will be okay.
EDIT:
What I mean by overdrawing is:
when you draw too many objects, rendering single frame will be very slow
when you draw more area than what you need, rendering a single frame will be very slow
It's quite complex topic.
First thing to consider is frustum culling. It will filter out objects that are not in camera’s field of view so you can just pass them on render stage.
The second thing is Z-sorting of objects that are in camera. It is better to render them from front to back so that near objects will write “near-value” to the depth buffer and far objects’ pixels will not be drawn since they will not pass depth test. This will save your GPU’s fill rate and pixel-shader work. Note however, if you have semitransparent objects in scene, they should be drawn first in back-to-front order to make alpha-blending possible.
Both things achievable if you use some kind of space partition such as Octree or Quadtree. Which is better depends on your game. Quadtree is better for big open spaces and Octree is better for in-door spaces with many levels.
And don't forget about simple back-face culling that can be enabled with single line in DirectX and OpenGL to prevent drawing of faces that are look at camera with theirs back-side.
Question is really too broad :o) Check out these "pointers" and ask more specifically.
Typical overdraw inhibitors are:
Z-buffer
Occlusion based techniques (various buffer techniques, HW occlusions, ...)
Stencil test
on little bit higher logic level:
culling (usually by view frustum)
scene organization techniques (usually trees or tiling)
rough drawing front to back (this is obviously supporting technique :o)
EDIT: added stencil test, has indeed interesting overdraw prevention uses especially in combination of 2d/3d.
Reduce the number of objects you consider for drawing based on distance, and on position (ie. reject those outside of the viewing frustrum).
Also consider using some sort of object-based occlusion system to allow large objects to obscure small ones. However this may not be worth it unless you have a lot of large objects with fairly regular shapes. You can pre-process potentially visible sets for static objects in some cases.
Your API will typically reject polygons that are not facing the viewpoint also, since you typically don't want to draw the rear-face.
When it comes to actual rendering time, it's often helpful to render opaque objects from front-to-back, so that the depth-buffer tests end up rejecting entire polygons. This works for 2D too, if you have depth-buffering turned on.
Remember that this is a performance optimisation problem. Most applications will not have a significant problem with overdraw. Use tools like Pix or NVIDIA PerfHUD to measure your problem before you spend resources on fixing it.

Resources