I need to visualize a design in windows application, and therefore need to draw diagonal lines very fast. I tried to work with GDI+ (because I need transparency) and the speed of diagonal lines is about 10 times slower than to draw vertical/horizontal lines. I need sometimes about 400ms to draw 2000 diagonals that cross the screen.
After this I tested Direct2D, and this was about 2x faster than GDI+, but way not fast enough. Now I am starting to look at OpenGL to draw 2D graphics. There I would look from above at the scene and use an orthogonal projection.
Can anybody tell me, what the right way is to draw high-speed diagonals?
Regards, Pete
You could manually rasterize the lines to a pixel buffer (simply a 2D integer array), preferably in a faster language (e.g. externall DLL in C), then blit the buffer as a bitmap to your window (GDI or DirectDraw are sufficient).
A good procedure for this would be Brensenham's line algorithm: https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm.
If you need anti-aliasing: https://en.wikipedia.org/wiki/Xiaolin_Wu%27s_line_algorithm.
I used this method for a software renderer project, and achieved a x2 performance increase (as you did with Direct2D, but I'm still using GDI+) when I migrated from VB.NET to C. I was able to get 40+ FPS at 800x600 on an old 1.7 GHz Pentium machine, and ~25 even with GDI+, so I'm not sure why performance is inadequate in your case.
For highest performance use OpenGL or Direct3D. You'll be able to take advantage of hardware accelerated drawing at the expense of more code to get things set up. You should be able to draw tens of thousands of lines in <16ms easily.
Related
I am trying to understand the whole 2D accelerated rendering process using SDL 2.0.
So my question is which would be the most efficient way to draw circles in the screen and why?
Some ways would be:
First to create a software surface and then draw the necessary pixels on that surface then create a texture out of that surface and lastly copy that texture to the rendering target.
Also another implementation would be to draw a circle using multiple times SDL_RenderDrawLine.And I think this is the way it is being implemented in SDL 2.0 gfx
Or there is a more efficient way to do all of this?
Take this question more generally in means of if I would wanted to draw other shapes manually, which probably, couldn't be rendered easily with the 2D rendering API that SDL provides(using draw line or rectangle).
With the example of circles this is a fairly complicated question, it is more based on the visual quality you wish to achieve which will drive performance. Drawing lots of short lines will vary vastly based on how close to a circle you wish to get, if you are happy to use say, 60 lines, which will work on small shapes nearly seamlessly but if scaled up will begin to appear not to be a circle, the performance will likely be better (depending on the user's hardware). Note also SDL_RenderDrawLines will be much much faster for many lines as it avoids lots of context switches for rendering calls.
However if you need a very accurate circle with thousands of lines to get a good approximation it will be faster to simply use a bitmap and scale and blit it. This will also give you a 'smoother' feel to the circle.
In my personal opinion I do not think the hardware accelerated render API has much use outside of some special uses such as graph rendering and perhaps very simple GUI drawing. For anything more complex I would usually use bitmap based drawing.
With regards to the second part, it again depends on the accuracy of any arcs you need to draw. If you can easily approximate the shape into a few tens of lines it will be fast, otherwise the pixel method is better.
I am creating a tile-based 2D game as a way of learning basic "modern" OpenGL concepts. I'm using shaders with OpenGL 2.1., and am familiar with the rendering pipeline and how to actually draw geometry on-screen. What I'm wondering is the best way to organize a tilemap to render quickly and efficiently. I have thought of several potential methods:
1.) Store the quad representing a single tile (vertices and texture coordinates) in a VBO and render each tile with a separate draw* call, translating it to the correct position onscreen and using uniform2i to give the location in the texture atlas for that particular tile;
2.) Keep a VBO containing every tile onscreen (already-computed screen coordinates and texture atlas coordinates), using BufferSubData to update the tiles every frame but using a single draw* call;
3.) Keep VBOs containing static NxN "chunks" of tiles, drawing however many chunks of tiles are at least partially visible onscreen and translating them each into position.
*I'd like to stay away from the last option if possible unless rendering chunks of 64x64 is not too inefficient. Tiles are loaded into memory in blocks of that size, and even though only about 20x40 tiles are visible onscreen at a time, I would have to render up to four chunks at once. This method would also complicate my code in several other ways.
So, which of these is the most efficient way to render a screen of tiles? Are there any better methods?
You could do any one of these and they would probably be fine; what you're proposing to render is very, very simple.
#1 will definitely be worse in principle than the other options, because you would be drawing many extremely simple “models” rather than letting the GPU do a whole lot of batch work on one draw call. However, if you have only 20×40 = 800 tiles visible on screen at once, then this is a trivial amount of work for any modern CPU and GPU (unless you're doing some crazy fragment shader).
I recommend you go with whichever is simplest to program for you, so that you can continue work on your game. I imagine this would be #1, or possibly #2. If and when you find yourself with a performance problem, do whichever of #2 or #3 (64×64 sounds like a fine chunk size) lets you spend the least CPU time on your program's part of drawing (i.e. updating the buffer(s)).
I've been recently learning modern OpenGL myself, through OpenGL ES 2.0 on Android. The OpenGL ES 2.0 Programming Guide recommends an "array of structures", that is,
"Store vertex attributes together in a single buffer. The structure represents all attributes of a vertex and we have an array of these attributes per vertex."
While this may seem like it would initially consume a lot of space, it allows for efficient rendering using VBOs and flexibility in texture mapping each tile. I recently did a tiled hex grid using interleaved arrays containing vertex, normals, color, and texture data for a 20x20 tile hex grid on a Droid 2. So far things are running smoothly.
I'm trying to create a platformer game, and I am taking various sprite blocks, and piecing them together in order to draw the level. This requires drawing a large number of sprites on the screen every single frame. A good computer has no problem handling drawing all the sprites, but it starts to impact performance on older computers. Since this is NOT a big game, I want it to be able to run on almost any computer. Right now, I am using the following DirectX function to draw my sprites:
D3DXVECTOR3 center(0.0f, 0.0f, 0.0f);
D3DXVECTOR3 position(static_cast<float>(x), static_cast<float>(y), z);
(my LPD3DXSPRITE object)->Draw((sprite texture pointer), NULL, ¢er, &position, D3DCOLOR_ARGB(a, r, g, b));
Is there a more efficient way to draw these pictures on the screen? Is there a way that I can use less complex picture files (I'm using regular png's right now) to speed things up?
To sum it up: What is the most performance friendly way to draw sprites in DirectX? thanks!
The ID3DXSPRITE interface you are using is already pretty efficient. Make sure all your sprite draw calls happen in one batch if possible between the sprite begin and end calls. This allows the sprite interface to arrange the draws in the most efficient way.
For extra performance you can load multiple smaller textures in to one larger texture and use texture coordinates to get them out. This makes it so textures don't have to be swapped as frequently. See:
http://nexe.gamedev.net/directknowledge/default.asp?p=ID3DXSprite
The file type you are using for the textures does not matter as long as they are are preloaded into textures. Make sure you load them all in to textures once when the game/level is loading. Once you have loaded them in to textures it does not matter what format they were originally in.
If you still are not getting the performance you want, try using PIX to profile your application and find where the bottlenecks really are.
Edit:
This is too long to fit in a comment, so I will edit this post.
When I say swapping textures I mean binding them to a texture stage with SetTexture. Each time SetTexture is called there is a small performance hit as it changes the state of the texture stage. Normally this delay is fairly small, but can be bad if DirectX has to pull the texture from system memory to video memory.
ID3DXsprite will reorder the draws that are between begin and end calls for you. This means SetTexture will typically only be called once for each texture regardless of the order you draw them in.
It is often worth loading small textures into a large one. For example if it were possible to fit all small textures in to one large one, then the texture stage could just stay bound to that texture for all draws. Normally this will give a noticeable improvement, but testing is the only way to know for sure how much it will help. It would look terrible, but you could just throw in any large texture and pretend it is the combined one to test what performance difference there would be.
I agree with dschaeffer, but would like to add that if you are using a large number different textures, it may better to smush them together on a single (or few) larger textures and adjust the texture coordinates for different sprites accordingly. Texturing state changes cost a lot and this may speed things up on older systems.
I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:
Replace floats with uint32's in a fixed point 16.16 configuration
Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
Use OpenGL Accumulation Buffer instead of float array
Use OpenGL floating-point FBO's instead of float array
Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?
The problem is simply the sheer amount of data you have to process.
Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:
Clear the buffer
Render something on it (uses reads and writes)
Convert to unsigned bytes
Upload to OpenGL
That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.
If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:
Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).
Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.
Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.
The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.
Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.
It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.
Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.
Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).
If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.
Since having blends is hitting perfomance of our game, we tried several blending strategies for creating the "illusion" of blending. One of them is drawing a sprite every odd frame, resulting in the sprite being visible half of the time. The effect is quit good. (You'd need a proper frame rate by the way, else your sprite would be noticeably flickering)
Despite that, I would like to know if there are any good insights out there in avoiding blending in order to better the overal performance without compromising (too much) of the visual experience.
Is it the actual blending that's killing your performance? (i.e. video memory bandwidth)
What games commonly do these days to handle lots of alpha blended stuff (think large explosions that cover whole screen): render them into a smaller texture (e.g. 2x2 smaller or 4x4 smaller than screen), and composite them back onto the main screen.
Doing that might require rendering depth buffer of opaque surfaces into that smaller texture as well, to properly handle intersections with opaque geometry. On some platforms (consoles) doing multisampling or depth buffer hackery might make that a very cheap operation; no such luck on regular PC though.
See article from GPU Gems 3 for example: High-Speed, Off-Screen Particles. Christer Ericson's blog post overviews a lot of optimization approaches as well: Optimizing the rendering of a particle system
Excellent article here about rendering particle systems quickly. It covers the smaller off screen buffer technique and suggest quite a few other approaches.
You can read it here
It is not quite clear from your question what kind of application of blending hits your game's performance. Generally blending is blazingly fast. If your problems are particle system related, then what is most likely to kill framerate is the number and size of particles drawn. Particularly lots of close up (and therefore large) particles will require high memory bandwidth and fill rate of the graphics card. I have implemented a particle system myself, and while I can render tons of particles in the distance, I feel the negative impact of e.g. flying through smoke (that will fill the entire screen because the viewer is amidst of it) very much on weaker hardware.