Direct2D performance - direct2d

So, I've been working on a project for a while now in DirectX11, but some people have been suggesting that I should be doing it in Direct2D. So, I've been playing with the idea in my project. What I've ended up with is HORRIFIC performance. Is Direct2D intended for use with hundreds of thousands of verteces? Because that's what I'm using.

Direct2D gives you a simple to use API to draw high quality 2d graphics, but it comes at a cost in performance compared to some fine tuned dx11 rendering.
Here is a primitive cost per draw for reference
FillRoundedRectangle (1 pixel corner) : 96
DrawRoundedRectancle: 264
FillRectangle : 6
DrawRectangle : 190
FillEllipse : 204
DrawEllipse : 204
Line (whatever vertical/size...) : 46
Bezier : from 312 -> 620
Direct2d is built on a feature level 10 device, and builds vertices in an internal buffer (only case where it uses instancing is to draw text).
So if you need to batch primitives an instanced drawer can yield a pretty hefty performance gain (as personal example, my timeline keyframe rendering went down from 15ms to 2ms swapping the draws from d2d to custom dx11 instanced shader).
If you are on windows 8, you can easily mix direct2d and direct3d in the same "viewport", so it can be worth a look (in my use case I use dx11 and structured buffers based instancing for all heavy parts, and swap to a direct2d context for text and other small bits).
If you need to draw custom geometry (specially with a reasonably high polygon count),it's best to stick to Direct3D11,since it's totally designed for it.

Related

how to draw high-speed diagonal lines in windows?

I need to visualize a design in windows application, and therefore need to draw diagonal lines very fast. I tried to work with GDI+ (because I need transparency) and the speed of diagonal lines is about 10 times slower than to draw vertical/horizontal lines. I need sometimes about 400ms to draw 2000 diagonals that cross the screen.
After this I tested Direct2D, and this was about 2x faster than GDI+, but way not fast enough. Now I am starting to look at OpenGL to draw 2D graphics. There I would look from above at the scene and use an orthogonal projection.
Can anybody tell me, what the right way is to draw high-speed diagonals?
Regards, Pete
You could manually rasterize the lines to a pixel buffer (simply a 2D integer array), preferably in a faster language (e.g. externall DLL in C), then blit the buffer as a bitmap to your window (GDI or DirectDraw are sufficient).
A good procedure for this would be Brensenham's line algorithm: https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm.
If you need anti-aliasing: https://en.wikipedia.org/wiki/Xiaolin_Wu%27s_line_algorithm.
I used this method for a software renderer project, and achieved a x2 performance increase (as you did with Direct2D, but I'm still using GDI+) when I migrated from VB.NET to C. I was able to get 40+ FPS at 800x600 on an old 1.7 GHz Pentium machine, and ~25 even with GDI+, so I'm not sure why performance is inadequate in your case.
For highest performance use OpenGL or Direct3D. You'll be able to take advantage of hardware accelerated drawing at the expense of more code to get things set up. You should be able to draw tens of thousands of lines in <16ms easily.

OS X Sprite Kit - Dirty Rects/Regions

Some background:
I have an existing OS X card game app that uses OpenGL.
The window is resizable, and a 4:3 aspect ratio is always maintained.
When the window is resized, the OpenGL view is resized accordingly. All visual elements are scaled accordingly. i.e. the cards maintain their relative sizes and distances from each other.
I'm interested in moving the code to a system that either uses Sprite Kit, or one predominantly based on Core Animation layers. Sprite Kit is more attractive to me in terms of feature set for my needs, but...
... I am concerned about Sprite Kit performance (or rather, needless performance, particularly on battery-powered Macs) for a game that essentially blasts the same textures to the screen, 60fps, even when nothing much is happening. (Most of the time, the cards are static, as the player ponders their next move.)
To reduce some of the (repetitive) drawing required, particularly at very large window sizes (e.g. fullscreen on a 30" monitor), I'm interested in using a "dirty rects/region" or "as-required" drawing system.
Question:
Does Sprite Kit provide some kind of dirty-rect drawing system, or the ability to implement such a drawing system? (Or, is it basically going to draw everything over and over at 60fps, regardless of the need to redraw?)
SK is a OpenGL renderer, naturally it will redraw its contents every frame. That however doesn't make it slow. While the dirty rect drawing of UI frameworks is a way to improve performance but also to reduce power consumption, they have to use this approach because rendering in UI frameworks is typically a lot slower (often not hardware accelerated) than in an OpenGL renderer.
On the other hand SK can be slower frame over frame if the rendered scene's complexity is extreme. But that sounds highly unlikely for a card game.
Generally You shouldn't concern yourself with performance until you wrote some code to test it with. Premature optimization and all...

Many or few textures (performance 3D engine)

I have two modes to continue programming a hexagonal map in this moment, and I don't know what way is better. Maybe you can help me :)
I used a texture to represent the "grid", so the squad with this texture is static and don't move or edit in runtime.
In the first hand, I have a texture with 7700x6736 pixles, however, his size it's only 3.131KB, when I run in a random engine (Unity in this case) the frame rate it's nice (constants 60fps with VSynk and +100 without VSynk)
This texture is associated in one transparent material to the squad (2 triangles)
With the second mode, I have a 14 textures to 550x496 pixels and 21KB. But with this mode, I need 14 squads (28 triangles against 2) and 14 materials with differents textures, against 1 in the other way.
Too, with this second mode, I need asking the distance of every squad to hide or not hide (a simple occlusion culling)
What is the better way in your opinion?
While your 7k texture works on your dev machine it may be not supported in some of the platforms you'll target. I'd use a 2048^2 as a safe maximum, or even a 1024^2.
The second problem is that it may use 3MB as a JPG/PNG compressed file, but in your video memory it will be as an uncompressed one (unless you use some texture-specific compression, but you may have problems with platform support again).
Additionally - you should consider if you really need the Non Power Of Two textures, officially they should be supported ATM, but you can still get into problems on some older hardware.
In general your solution depends on the platforms that you want to target, and especially if you plan to target mobile devices (and which ones).

OpenGL tile rendering: most efficient way?

I am creating a tile-based 2D game as a way of learning basic "modern" OpenGL concepts. I'm using shaders with OpenGL 2.1., and am familiar with the rendering pipeline and how to actually draw geometry on-screen. What I'm wondering is the best way to organize a tilemap to render quickly and efficiently. I have thought of several potential methods:
1.) Store the quad representing a single tile (vertices and texture coordinates) in a VBO and render each tile with a separate draw* call, translating it to the correct position onscreen and using uniform2i to give the location in the texture atlas for that particular tile;
2.) Keep a VBO containing every tile onscreen (already-computed screen coordinates and texture atlas coordinates), using BufferSubData to update the tiles every frame but using a single draw* call;
3.) Keep VBOs containing static NxN "chunks" of tiles, drawing however many chunks of tiles are at least partially visible onscreen and translating them each into position.
*I'd like to stay away from the last option if possible unless rendering chunks of 64x64 is not too inefficient. Tiles are loaded into memory in blocks of that size, and even though only about 20x40 tiles are visible onscreen at a time, I would have to render up to four chunks at once. This method would also complicate my code in several other ways.
So, which of these is the most efficient way to render a screen of tiles? Are there any better methods?
You could do any one of these and they would probably be fine; what you're proposing to render is very, very simple.
#1 will definitely be worse in principle than the other options, because you would be drawing many extremely simple “models” rather than letting the GPU do a whole lot of batch work on one draw call. However, if you have only 20×40 = 800 tiles visible on screen at once, then this is a trivial amount of work for any modern CPU and GPU (unless you're doing some crazy fragment shader).
I recommend you go with whichever is simplest to program for you, so that you can continue work on your game. I imagine this would be #1, or possibly #2. If and when you find yourself with a performance problem, do whichever of #2 or #3 (64×64 sounds like a fine chunk size) lets you spend the least CPU time on your program's part of drawing (i.e. updating the buffer(s)).
I've been recently learning modern OpenGL myself, through OpenGL ES 2.0 on Android. The OpenGL ES 2.0 Programming Guide recommends an "array of structures", that is,
"Store vertex attributes together in a single buffer. The structure represents all attributes of a vertex and we have an array of these attributes per vertex."
While this may seem like it would initially consume a lot of space, it allows for efficient rendering using VBOs and flexibility in texture mapping each tile. I recently did a tiled hex grid using interleaved arrays containing vertex, normals, color, and texture data for a 20x20 tile hex grid on a Droid 2. So far things are running smoothly.

graphics: best performance with floating point accumulation images

I need to speed up some particle system eye candy I'm working on. The eye candy involves additive blending, accumulation, and trails and glow on the particles. At the moment I'm rendering by hand into a floating point image buffer, converting to unsigned chars at the last minute then uploading to an OpenGL texture. To simulate glow I'm rendering the same texture multiple times at different resolutions and different offsets. This is proving to be too slow, so I'm looking at changing something. The problem is, my dev hardware is an Intel GMA950, but the target machine has an Nvidia GeForce 8800, so it is difficult to profile OpenGL stuff at this stage.
I did some very unscientific profiling and found that most of the slow down is coming from dealing with the float image: scaling all the pixels by a constant to fade them out, and converting the float image to unsigned chars and uploading to the graphics hardware. So, I'm looking at the following options for optimization:
Replace floats with uint32's in a fixed point 16.16 configuration
Optimize float operations using SSE2 assembly (image buffer is a 1024*768*3 array of floats)
Use OpenGL Accumulation Buffer instead of float array
Use OpenGL floating-point FBO's instead of float array
Use OpenGL pixel/vertex shaders
Have you any experience with any of these possibilities? Any thoughts, advice? Something else I haven't thought of?
The problem is simply the sheer amount of data you have to process.
Your float buffer is 9 megabytes in size, and you touch the data more than once. Most likely your rendering loop looks somewhat like this:
Clear the buffer
Render something on it (uses reads and writes)
Convert to unsigned bytes
Upload to OpenGL
That's a lot of data that you move around, and the cache can't help you much because the image is much larger than your cache. Let's assume you touch every pixel five times. If so you move 45mb of data in and out of the slow main memory. 45mb does not sound like much data, but consider that almost each memory access will be a cache miss. The CPU will spend most of the time waiting for the data to arrive.
If you want to stay on the CPU to do the rendering there's not much you can do. Some ideas:
Using SSE for non temporary loads and stores may help, but they will complicate your task quite a bit (you have to align your reads and writes).
Try break up your rendering into tiles. E.g. do everything on smaller rectangles (256*256 or so). The idea behind this is, that you actually get a benefit from the cache. After you've cleared your rectangle for example the entire bitmap will be in the cache. Rendering and converting to bytes will be a lot faster now because there is no need to get the data from the relative slow main memory anymore.
Last resort: Reduce the resolution of your particle effect. This will give you a good bang for the buck at the cost of visual quality.
The best solution is to move the rendering onto the graphic card. Render to texture functionality is standard these days. It's a bit tricky to get it working with OpenGL because you have to decide which extension to use, but once you have it working the performance is not an issue anymore.
Btw - do you really need floating point render-targets? If you get away with 3 bytes per pixel you will see a nice performance improvement.
It's best to move the rendering calculation for massive particle systems like this over to the GPU, which has hardware optimized to do exactly this job as fast as possible.
Aaron is right: represent each individual particle with a sprite. You can calculate the movement of the sprites in space (eg, accumulate their position per frame) on the CPU using SSE2, but do all the additive blending and accumulation on the GPU via OpenGL. (Drawing sprites additively is easy enough.) You can handle your trails and blur either by doing it in shaders (the "pro" way), rendering to an accumulation buffer and back, or simply generate a bunch of additional sprites on the CPU representing the trail and throw them at the rasterizer.
Try to replace the manual code with sprites: An OpenGL texture with an alpha of, say, 10%. Then draw lots of them on the screen (ten of them in the same place to get the full glow).
If you by "manual" mean that you are using the CPU to poke pixels, I think pretty much anything you can do where you draw textured polygons using OpenGL instead will represent a huge speedup.

Resources