Rendering only to a part of a texture

Rendering only to a part of a texture - opengl-es

Can I bind a 2000x2000 texture to a color attachment in a FBO, and tell OpenGL to behave exactly as if the texture was smaller, let's say 1000x1000?
The point is, in my rendering cycle I need many (mostly small) intermediate textures to render to, but I need only 1 at a time. I am thinking that, rather than creating many smaller textures, I will have only 1 appropriately large, and I will bind it to an FBO at hand, tell OpenGL to render only to part of it, and save memory.
Or maybe I should be destroying/recreating those textures many times per frame? That would certainly save even more memory, but wouldn't that cause a noticeable slowdown?

Can I bind a 2000x2000 texture to a color attachment in a FBO, and
tell OpenGL to behave exactly as if the texture was smaller, let's say
1000x1000?
Yes, just set glViewport() to the region you want to render to, and remember to adjust glScissor() bounding regions if you are ever enabling scissor testing.
Or maybe I should be destroying/recreating those textures many times
per frame? That would certainly save even more memory, but wouldn't
that cause a noticeable slowdown?
Completely destroying and recreating a new texture object every frame will be slow because it will cause constant memory reallocation overhead, so definitely don't do that.
Having a pool of pre-allocated textures which you cycle though is fine though - that's a pretty common technique. You won't really save much in terms of memory storing a 2K*2K texture vs storing 4 separate 1K*1K textures - the total storage requirement is the same and the additional metadata overhead is tiny in comparison - so if keeping them separate is easier in terms of application logic I'd suggest doing that.

Related

Strategies for optimizing OpenGL-based animated/zoomable 2D UI rendering

I've built a scene graph based rendering system for my app's UI (http://audulus.com if you're curious). The app's UI is procedural, and a lot of it is animated. There are few pre-rendered images.
Currently, it caches unchanging groups of drawables in mipmapped textures. I use mipmaps because the UI is zoomable. Overall, this has been a big performance win, but there are several downsides:
Building the mipmaps (via glGenerateMipmap) takes time, reducing the frame rate when one part of the UI goes from animated to static.
Visual differences between the texture-cached geometry and not, causing slight flickering. (Might be able to get around this by being more clever with my path rendering code, but it seems hard)
Memory usage for all the textures (I could dump the offscreen textures, but that exacerbates problem 1)
A couple alternative approaches I've thought of:
Instead of texture caching, coalesce static paths into bigger paths. My paths are already VBO/VAO-based, but this could reduce the number of GL calls. (When turning off texture caching, my performance is mainly CPU-bound). Big win on memory usage. The primary problems with this approach are: complicating my path rendering shader (since it must handle paths with different attributes within one call to glDrawArrays), not handling the caching of other primitives (such as text), and more of a burden on the GPU than simply rendering a texture.
Still use textures, but avoid mip-mapping. As the UI is zoomed, textures could be resized (though this might have to be deferred since re-rendering the whole UI during zooming is too expensive). Delete textures for offscreen geometry. Downside of course is poor texture magnification/minification during UI zooming.
UPDATE
I tried (2). Resizing the textures is quite slow, so I prevent the UI from resizing them during zoom. This works reasonably well, but the magnification looks terrible when zooming starting small:
Note that some of the modules aren't texture cached because they are tagged as animating.
UPDATE 2
I'm beginning to work on approach 1, so I deactivated the texture caching.
Though I'm CPU bound, practically all my GPU-side load comes from my path anti-aliasing fragment shader. Here's what performance looks like with it on:
And with it off:
So further optimization of that will be a big win on the GPU side. I tried ditching it and going with 4x supersampling, but that looks like garbage, reminding me why I spent considerable time working on the path rendering shader.

Approach (1) is a big win. It has essentially the same performance as texture caching, but without any visual artifacts or much memory usage.

webgl performance cost of switching shader and texture

I.To switching a shader effect which way is better?
1.using a big shader program and using uniform an if/else clause in shader program to use difference effect.
2.switch program between calls.
II.Is it better to use a big texture or use several small texture? And does upload texture cost mush,how about bind texture?

Well, it would probably be best to write some perf tests and try it but in general.
Small shaders are faster than big.
1 texture is faster the many textures
uploading textures is slow
binding textures is fast
switching programs is slow but usually much faster than combining 2 small programs into 1 big program.
Fragment shaders in particular get executed millions of times a frame. A 1920x1080 display has 2 million pixels so of there was no overdraw that would still mean your shader gets executed 2 million times per frame. For anything executed 2 million times a frame, or 120 million times a second of you're targeting 60 frames per second, smaller is going to be better.
As for textures, mips are faster than no mips because the GPU has a cache for textures and if the pixel it needs next are near the ones it previously read they'll likely already be in the cache. If they are far away they won't be in the cache. That also means randomly reading from a texture is particularly slow. But most apps read fairly linearly through a texture.
Switching programs is slow enough that sorting models by which program they use so that you draw all models that use program A first then all models that use program B is generally faster than drawing them in a random order. But there are other things the effect performance too. For example if a large model is obscuring a small model it's better to draw the large model first since the small model will then fail the depth test (z-buffer) and will not have its fragment shader executed for any pixels. So it's a trade off. All you can really do is test your particular application.
Also, it's important to test in the correct way.
http://updates.html5rocks.com/2012/07/How-to-measure-browser-graphics-performance

huge memory used when load Multiple animations and textures with Cocos2d, how to solve it

I am working on a gameplay which needs load 27 texture altas (each one 1024 * 1024) before enter the game scene
but sometimes my game crash because receiving memory warning
I know 27 texture altas will use:
4 * 27 * 1024 * 1024 = 108mb memory
which is huge amount, but I really need to load them before entering game.
Is there anyway to solve my issue?
Any ideas will be very appreciated!
BTW:
I am using cocos2d 1.0.1

Best suggestion is to review your design, and the 'need' for preloading all these textures. I tend to pre-load only the textures that are most frequently used (animations and static map objects).
For example, I have textures for animating walks on a map for 16 character classes. I regrouped the 'idle' animations in 4 textures, and preload these, because initially, when a soldier enters the scene, it idles. The moving animations are in separate textures that are loaded in real time, as a function of the direction of travel, for each character class in movement. When the characters stop walking (idle), i remove unused textures from the cache, as well as unused sprite frames.
Also: there are other avenues for memory management. You could use a 16 bit format for certain textures (RGB88888 is the default). You may gain by converting to compressed PVR format (once again this is lossy, but could be fine for some textures)
Look here and there to learn more about picture formats in coco, and the relationship to memory consumption (as well as load, rendering speeds). But once again, before you start optimizing, make certain you have no alternative to the pre-load all approach.

use jpg instead of png it will make that non transparent you can make that transparent by alpha image of that image it will help you reducing size almost half of you are using now.

DirectX9 - Efficiently Drawing Sprites

I'm trying to create a platformer game, and I am taking various sprite blocks, and piecing them together in order to draw the level. This requires drawing a large number of sprites on the screen every single frame. A good computer has no problem handling drawing all the sprites, but it starts to impact performance on older computers. Since this is NOT a big game, I want it to be able to run on almost any computer. Right now, I am using the following DirectX function to draw my sprites:
D3DXVECTOR3 center(0.0f, 0.0f, 0.0f);
D3DXVECTOR3 position(static_cast<float>(x), static_cast<float>(y), z);
(my LPD3DXSPRITE object)->Draw((sprite texture pointer), NULL, &center, &position, D3DCOLOR_ARGB(a, r, g, b));
Is there a more efficient way to draw these pictures on the screen? Is there a way that I can use less complex picture files (I'm using regular png's right now) to speed things up?
To sum it up: What is the most performance friendly way to draw sprites in DirectX? thanks!

The ID3DXSPRITE interface you are using is already pretty efficient. Make sure all your sprite draw calls happen in one batch if possible between the sprite begin and end calls. This allows the sprite interface to arrange the draws in the most efficient way.
For extra performance you can load multiple smaller textures in to one larger texture and use texture coordinates to get them out. This makes it so textures don't have to be swapped as frequently. See:
http://nexe.gamedev.net/directknowledge/default.asp?p=ID3DXSprite
The file type you are using for the textures does not matter as long as they are are preloaded into textures. Make sure you load them all in to textures once when the game/level is loading. Once you have loaded them in to textures it does not matter what format they were originally in.
If you still are not getting the performance you want, try using PIX to profile your application and find where the bottlenecks really are.
Edit:
This is too long to fit in a comment, so I will edit this post.
When I say swapping textures I mean binding them to a texture stage with SetTexture. Each time SetTexture is called there is a small performance hit as it changes the state of the texture stage. Normally this delay is fairly small, but can be bad if DirectX has to pull the texture from system memory to video memory.
ID3DXsprite will reorder the draws that are between begin and end calls for you. This means SetTexture will typically only be called once for each texture regardless of the order you draw them in.
It is often worth loading small textures into a large one. For example if it were possible to fit all small textures in to one large one, then the texture stage could just stay bound to that texture for all draws. Normally this will give a noticeable improvement, but testing is the only way to know for sure how much it will help. It would look terrible, but you could just throw in any large texture and pretend it is the combined one to test what performance difference there would be.

I agree with dschaeffer, but would like to add that if you are using a large number different textures, it may better to smush them together on a single (or few) larger textures and adjust the texture coordinates for different sprites accordingly. Texturing state changes cost a lot and this may speed things up on older systems.

graphics: best performance with floating point accumulation images

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:
Replace floats with uint32's in a fixed point 16.16 configuration
Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
Use OpenGL Accumulation Buffer instead of float array
Use OpenGL floating-point FBO's instead of float array
Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?

The problem is simply the sheer amount of data you have to process.
Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:
Clear the buffer
Render something on it (uses reads and writes)
Convert to unsigned bytes
Upload to OpenGL
That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.
If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:
Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).
Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.
Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.
The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.
Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.

It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.
Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.

Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).

If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio