How to optimize performance in xna for windows phone - windows-phone-7

In one of the levels of my game i have quite many big textures to draw. To give you idea what I'm talking about these are the textures sizes and the quantity in the level.
448x420 - 3, 315x400-2, 305x429 -3, 366x167-1, 356x265-4, 401x343-2, 387x251-1 plus about 20 elements with much smaller texture size.
The performance of my original implementation was 20fps. Then I created a single texture map that contained all the textures used in the level, this gave me about 3-4 extra fps. 24fps is not enough for me, what else can I try to optimize my performance?

Some things that I could think of in a minute
Does it have a dedicated GPU and Memory or using shared memory?
Have you tried expanding your textures to a power of two size (although only loading times might improve).
Are you using mipmaps or other types of texture filtering?
Howmany texels is it trying to gather?
Have you tried to use a texture of 1 by 1, just to see if that is actually your bottleneck?
In my experience, texel calls on mobile devices are quite heavy. Be sure to try and decrease the size of your textures as much as possible. Look at your project, if an object only takes about 10% of the screen, does it really need a texture of 128x128 or bigger?
I don't know what kind of project and target device you use, but textures of 448x420(non power of two?!) seem way overkill for an mobile game. In fact, I'd recon you try and stay below a total combined texture usage of 1024x1024 or something?
Do I understand correct that this is your total texture size?
(448 * 420 * 3) + (315 * 400 * 2) + (305 * 429 * 3) + (366 * 167) + (356 * 265 * 4) + (401 * 343 * 2) + (387 * 251) = 2.019.720 pixels (ignoring the ~20 elements you mentioned)
Which means assuming 32 bits textures, in theory, you are using about 64 MB of ram, only for texture storage (and not even assuming mipmaps and XNA not up-scaling to power of two textures) (not sure if XNA internally uses dds/dxt compression(?)). I can imagine this to be quite high for a mobile device.
Hope this might help solving your bottleneck.

You should try profiling your application, to check if the issues are actually in the graphics rendering or somewhere else.
There are some tools available for this, although most have limitations when working in WP7. The first recommendations would be Microsoft's Windows Phone profiler and PIX for graphics monitoring, mentioned in the post. There is also the XNA Framework Remote Performance Monitor, but as far as I can tell there is no WP7 compatible version.

Related

How to create textures from large images in opengl (bigger than the MAX_TEXTURE_SIZE)

I've found that the maximum texture size that my opengl can support is 8192 but the image that I'm working with is 16997x15931. As you can see in this link, I've completed the class COpenGLControl and customized it for my own use to work with a smaller 7697x7309 image and activated different navigation tasks for it.
Render an outlined red rectangle on top a 2D texture in OpenGL
but now in the last stages of work, I've decided to change the part where applies the texture and enable it to handle images bigger than the size 8192.
Questions:
Is it possible in my opengl?
what concept should I study mipmaps, multiple texturing?
Will it expand performance of code?
Right now my program uses 271 MB of ram for just showing this small image(7697x7309) and I'm going to add a task to it (for image-processing filtering processes) that I have used all my effort to optimize the code but it uses 376 MB of ram for the (7697x7309) image(the code is already written as a console application will be combined with this project). So I think the final project would use up to 700 MB of ram for images near the 7000x7000 size. Obviously for the bigger image (16997x15931 ) the usage of ram will be alot higher!
So I'm looking for a concept to handle images bigger than the MAX_TEXTURE_SIZE and also optimize the performance of the program
More Questions:
What concept should I study in OpenGL to achieve the above goal?
explain alittle about the concept that you suggest?
I've asked the question in Game Developement too but decided to repeat the question here maybe it will have more viewers. As soon as I get the answer, I will delete the question from either on of the sites. So don't worry about multiple questionings.
I will try to sum up my comments for the original question.
know your proper opengl version: maybe you can load some modern extension and work with even the recent version of opengl.
if it is possible you can take a look at Sparse Textures (Mega Textures): ARB_sparse_texture or AMD_sparse_texture
to reduce memory you can use some texture compression:
How to: load DDS files in OpenGL.
another simple idea: you can split the huge texture and create 4 smaller textures (from 16k x 16k into four 8k x 8k) and somehow render four squares.
maybe you can use OpenCL or CUDA to do the work?
regarding mipmaps: it is set of smaller version of your input texture, mipmaps improve performance and final quality of the filtering, but you need another 33% more memory for a texture with full mipmap chain. In your case they could be very helpful. For instance when you look at a wall from a huge distance you do not have to use full (large) texture... only a small version of it is enough. g-truc on mipmaps
In general there is a lot of options, but it depends on your experience what is simpler and fastest to implement.

huge memory used when load Multiple animations and textures with Cocos2d, how to solve it

I am working on a gameplay which needs load 27 texture altas (each one 1024 * 1024) before enter the game scene
but sometimes my game crash because receiving memory warning
I know 27 texture altas will use:
4 * 27 * 1024 * 1024 = 108mb memory
which is huge amount, but I really need to load them before entering game.
Is there anyway to solve my issue?
Any ideas will be very appreciated!
BTW:
I am using cocos2d 1.0.1
Best suggestion is to review your design, and the 'need' for preloading all these textures. I tend to pre-load only the textures that are most frequently used (animations and static map objects).
For example, I have textures for animating walks on a map for 16 character classes. I regrouped the 'idle' animations in 4 textures, and preload these, because initially, when a soldier enters the scene, it idles. The moving animations are in separate textures that are loaded in real time, as a function of the direction of travel, for each character class in movement. When the characters stop walking (idle), i remove unused textures from the cache, as well as unused sprite frames.
Also: there are other avenues for memory management. You could use a 16 bit format for certain textures (RGB88888 is the default). You may gain by converting to compressed PVR format (once again this is lossy, but could be fine for some textures)
Look here and there to learn more about picture formats in coco, and the relationship to memory consumption (as well as load, rendering speeds). But once again, before you start optimizing, make certain you have no alternative to the pre-load all approach.
use jpg instead of png it will make that non transparent you can make that transparent by alpha image of that image it will help you reducing size almost half of you are using now.

Slow performance on Android tablet with SurfaceView

I'm developing a card game in Android using SurfaceView and canvas to draw the UI.
I've tried to optimize everything as much as possible but I still have two questions:
During the game I'll need to draw 40 bitmaps (the 40 cards in the italian deck), is it better to create all the bitmaps on the onCreate method of my customized SurfaceView (storing them in an array), or create them as needed (every time the user get a new card for example)?
I'm able to get over 90 fps on an old Samsung I5500 (528 MHz, with a QVGA screen), 60 fps on an Optimus Life (800 MHz and HVGA screen) and 60 fps with a Nexus One/Motorola Razr (1 GHz and dual core 1GHz with WVGA and qHD screens) but when I run the game on an Android tablet (Motorola Xoom dual core 1 GHz and 1 GB of Ram) I get only 30/40 fps... how is that possible that a 528 MHz cpu with 256 MB of RAM can handle 90+ fps and a dual core processor can't handle 60 fps? I'm not seeing any kind of GC calling at runtime....
EDIT: Just to clarify I've tried both ARGB_888 and RGB_565 without any changes in the performance...
Any suggestions?
Thanks
Some points for you to consider:
It is recommended not to create new objects while your game is running, otherwise, you may get unexpected garbage collections.
Your FPS numbers doesn't sound good, you may have measurement errors, However my guess is that you are resizing the images to fit the screen size and that affects the memory usage of your game and may cause slow rendering times on tablets.
You can use profiling tools to confirm: TraceView
OpenGL would be much faster
last tip: don't draw overlapping cards if you can, draw only the visible ones.
Good Luck
Ok so it's better to create the bitmap in the onCreate method, that is what I'm doing right now...
They are ok, I believe that the 60 fps on some devices are just some restrictions made by producers since you won't find any advantage in getting more than 60 fps (I'm making this assumption since it doesn't change rendering 1 card, 10 cards or no card... the OnDraw method is called 60 times per second, but if I add for example 50/100 cards it drops accordingly) I don't resize any card cause I use the proper folder (mdpi, hdpi, ecc) for each device, and I get the exact size of the image, without resizing it...
I've tried to look at it but from what I understand all the time of the app execution is used to draw the bitmap, not to resize or update its position here it is:
I know, but it would add complexity to the developing and I believe that using a canvas for 7 cards on the screen should be just fine….
I don't draw every card of the deck.. I just swap bitmap as needed :)
UPDATE: I've tried to run the game on a Xoom 2, Galaxy Tab 7 plus and Asus Transformer Prime and it runs just fine with 60 fps…. could it be just a problem of Tegra 2 devices?

OpenGL performance on rendering "virtual gallery" (textures)

I have a considerable (120-240) amount of 640x480 images that will be displayed as textured flat surfaces (4 vertex polygons) in a 3D environment. About 30-50% of them will be visible in a given frame. It is possible for them to crossover. Nothing else will be present in the environment.
The question is - will the modern and/or few-years-old (lets say Radeon 9550) GPU cope with that, and what frame rate can I expect? I aim for 20FPS, but 30-40 would be nice. Would changing the resolution to 320x240 make it more probable to happen?
I do not have any previous experience with performance issues of 3D graphics on modern GPUs, and unfortunately I must make a design choice. I don't want to waste time on doing something that couldn't have worked :-)
Assuming you have RGB textures, that would be 640*480*3*120 Bytes = 105 MB minimum of texture data, which should fit in VRAM of more recent graphics cards without swapping, so this wont be of an issue. However, texture lookups might get a bit problematic but this is hard to judge for me without trying. Given that you only need to process 50% of 105 MB, that is about 50 MB (very rough estimate) while targetting 20 FPS means 20*50MB/sec = about 1GB/sec. This should be possible to throughput even on older hardware.
Reading the specs of an older Radeon 9600 XT, it says peak fill-rate of 2000Mpixels/sec and if i'm not mistake you require far less than 100Mpixels/sec. Peak memory b/w is specified with 9.6GB/s, while you'd need about 1 GB/s (as explained above).
It would argue that this should be possible, if done correctly - esp. current hardware should have not problem at all.
Anyways, you should simply try out: Loading some random 120 textures and displaying them in some 120 quads can be done in very few lines of code with hardly any effort.
First of all, you should realize that the dimensions of textures should normally be powers of two, so if you can change them something like 512x256 (for example) would be a better starting point.
From that, you can create MIPmaps of the original, which are simply versions of the original scaled down by powers of two, so if you started with 512x256, you'd then create versions at 256x128, 128x64, 64x32, 32x16, 16x8, 8x4, 4x2, 2x1 and 1x1. When you've done this, OpenGL can/will select the "right" one for the size it'll show up at in the final display. This generally reduces the work (and improves quality) in scaling the texture to the desired size.
The obvious sticking point with that would be running out of texture memory. If memory serves, in the 9550 timeframe you could probably expect 256 MB of on-board memory, which would be about sufficient, but chances are pretty good that some of the textures would be in system RAM. That overflow would probably be fairly small though, so it probably won't be terribly difficult to maintain the kind of framerate you're hoping for. If you were to add a lot more textures, however, it would eventually become a problem. In that case, reducing the original size by 2 in each dimension (for example) would reduce your memory requirement by a factor of 4, which would make fitting them into memory a lot easier.

graphics: best performance with floating point accumulation images

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:
Replace floats with uint32's in a fixed point 16.16 configuration
Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
Use OpenGL Accumulation Buffer instead of float array
Use OpenGL floating-point FBO's instead of float array
Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?
The problem is simply the sheer amount of data you have to process.
Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:
Clear the buffer
Render something on it (uses reads and writes)
Convert to unsigned bytes
Upload to OpenGL
That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.
If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:
Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).
Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.
Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.
The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.
Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.
It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.
Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.
Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).
If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

Resources