Dynamic or Static vertex buffer? - performance

I'm writing a directx app and want to overlay a grid on the front of the scene. The grid will possibly update every frame but will be something like 20 horizontal lines and 20 vertical lines (LineList).
I'm trying to understand if this situation (small amount of vertices updated frequently) means a dynamic buffer is more appropriate than a static buffer?
Is anyone able to advise on this? I've not been able to find a low-level explanation of the difference between the two - it sounds like dynamic is 'more accessible' to the CPU and requires some locking semantics whereas static is less accessible.
Cheers

You will likely want to use a dynamic vertex buffer. If you want to update the vertices on a per frame basis then dynamic is the way to go.
See this MSDN article for a more low-level description
MSDN Article

If you are changing the buffer every frame, use Dynamic buffers.
Using static buffers will cause the GPU to stall every time you change the buffer, which will crash performance.
I'm not sure about dynamic buffers in direct3d10, the name seems to come from direct3d9. Direct3D10 has a somewhat more elaborate scheme of creating 'dynamic' buffers but you should not be using Static buffers in any case.

Related

Get ID3DXMesh from CDXUTSDKMesh

In a DirectX 11 demo application, I use a CDXUTSDKMesh for my static geometry. It is loaded and already displayed.
I'm doing some experiments related to Precomputed Radiance Transfer in this application. The ID3DXPRTEngine would be a pretty handy tool for the job. Unfortunately, D3DX and DXUT don't seem to be very compatible there.
ID3DXPRTEngine requires a single ID3DXMesh (multiple meshes can be concatenated with D3DXConcatenateMeshes no problem). Is there an easy way to 'convert' CDXUTSDKMesh to one or multiple ID3DXMesh instances?
Short Answer: No.
Long answer: It can be done, but it requires that you be able to read the raw vertex data, convert it to the Direct3D 9 FVF system (the old fixed shader pipeline), and write the new data back.
Basically what you're trying to do is inter-operate two different versions of the Direct3D API: version 9.0 and version 11.0. In order to do this, you need to do the following:
Confirm your verticies contain no custom semantics (i.e. D3D11 allows a semantic like RANDOM_EYE_VEC, whereas D3D9 does not).
Confirm you have the ability to read your vertex and face information into a byte buffer (i.e. a char pointer).
Create a Direct3D 9 Device and a Direct3D 11 device.
Open your mesh, pull the data into a byte buffer, and create an ID3DXMesh object from that data. If you have more than 2^16 faces, you will need to split the mesh into two or more meshes.
Create the PRT Engine from the mesh(es), and run your PRT calculations.
Read the new information from the mesh(es) using ID3DXBaseMesh::GetVertexBuffer() (not sure if this step is entirely correct as I've never actually used the PRT engine).
Write the new data back into the CDXUTSDKMesh.
As you may be able to tell, this process is long, prone to error, and very slow. If you're doing this for an offline tool, it may be OK, but if this is to be used in real time, you're probably better off writing your own PRT engine.
Additionally, if your mesh has any custom semantics (i.e. you use a shader that expects some data that wasn't a part of the D3D9 world) then this algorithm will fail. The PRT engine uses the fixed function pipeline, so any new meshes that take advantage of D3D11 features that didn't exist back then are not going to work here.

performance of layered canvases vs manual drawImage()

I've written a small graphics engine for my game that has multiple canvases in a tree(these basically represent layers.) Whenever something in a layer changes, the engine marks the affected layers as "soiled" and in the render code the lowest affected layer is copied to its parent via drawImage(), which is then copied to its parent and so on up to the root layer(the onscreen canvas.) This can result in multiple drawImage() calls per frame but also prevents rerendering anything below the affected layer. However, in frames where nothing changes no rendering or drawImage() calls take place, and in frames where only foreground objects move, rendering and drawImage() calls are minimal.
I'd like to compare this to using multiple onscreen canvases as layers, as described in this article:
http://www.ibm.com/developerworks/library/wa-canvashtml5layering/
In the onscreen canvas approach, we handle rendering on a per-layer basis and let the browser handle displaying the layers on screen properly. From the research I've done and everything I've read, this seems to be generally accepted as likely more efficient than handling it manually with drawImage(). So my question is, can the browser determine what needs to be re-rendered more efficiently than I can, despite my insider knowledge of exactly what has changed each frame?
I already know the answer to this question is "Do it both ways and benchmark." But in order to get accurate data I need real-world application, and that is months away. By then if I have an acceptable approach I will have bigger fish to fry. So I'm hoping someone has been down this road and can provide some insight into this.
The browser cannot determine anything when it comes to the canvas element and the rendering as it is a passive element - everything in it is user rendered by the means of JavaScript. The only thing the browser does is to pipe what's on the canvas to the display (and more annoyingly clear it from time to time when its bitmap needs to be re-allocated).
There is unfortunately no golden rule/answer to what is the best optimization as this will vary from case to case - there are many techniques that could be mentioned but they are merely tools you can use but you will still have to figure out what would be the right tool or the right combination of tools for your specific case. Perhaps layered is good in one case and perhaps it doesn't bring anything to another case.
Optimization in general is very much an in-depth analysis and break-down of patterns specific to the scenario, that are then isolated and optimized. The process if often experiment, benchmark, re-adjust, experiment, benchmark, re-adjust, experiment, benchmark, re-adjust... of course experience reduce this process to a minimum but even with experience the specifics comes in a variety of combinations that still require some fine-tuning from case to case (given they are not identical).
Even if you find a good recipe for your current project it is not given that it will work optimal with your next project. This is one reason no one can give an exact answer to this question.
However, when it comes canvas what you want to achieve is a minimum of clear operations and minimum areas to redraw (drawImage or shapes). The point with layers is to groups elements together to enable this goal.

DrawPrimitives performance

I want to draw single faces instead of xna models because it's too slow.
But I don't know what the difference is between
DrawPrimitives
DrawUserPrimitives
DrawIndexedPrimitives
DrawUserIndexedPrimitives
Which one is the fastest method? And what are the indices good for?
The simple answer to your question is that the "User" versions are a fair bit slower on the CPU because they have to transfer vertex data to the GPU (via the driver and the bus) each time they are called.
The non-User versions use vertex and index buffers that already exist on the GPU (you put them there at load time). They have considerably less data to transfer, so they are faster.
The "User" and "Indexed" versions will also each have a performance impact on the GPU. This impact is relatively tiny. Generally speaking you don't need to worry about it.
The User versions exist because they are faster when your data changes each frame. There is also DynamicVertexBuffer which can be used with the non-User version of the draw functions. I believe it is slightly faster than the User methods in cases where you can pre-allocate the buffer at the desired size.
The Indexed versions allow you to select vertices out of your vertex buffer using an index buffer (so triangles that you draw can choose vertices at any position in the vertex buffer). The alternative is that your vertex buffer is simply interpreted as as sequential list of triangle vertices (based on PrimitiveType). The main reason for the existence of index buffers is to remove the need for duplicate vertices in your vertex buffer (which would require additional memory and processing on the GPU).
BUT...
XNA's Model class internally uses DrawIndexedPrimitives. Not only that, but it uses it correctly (ie: it doesn't draw single faces - but as many as it can at once - for the best performance). So if you are finding that it is slow, then your problem lies elsewhere.
I suggest trying to diagnose the reason why your game is performing poorly, before trying to select a "solution". Maybe ask for help doing that in a question here (or on https://gamedev.stackexchange.com/).
All in one time if you can , Instancied draw will be always better , but that need you to give all the textures in one time ! In my case , for example , I like to draw instancied objects with 1 only texture ... all the trees , all the ground , all buildings , etc ...

Speed of large amount of animated bitmaps in EaselJS

I seem to have a bit of trouble with using a large amount of animated bitmaps (all based on the same spritesheet) when using EaselJS. When I run a couple of these at once on my stage, there is no problem at all, but when running a higher amount of them at the same time (starting at around 30 to 40) whilst moving them around I start to have them "flicker" quite a bit, even at an fps of around 10.
I'm not using any shadows or anything else along those lines. Just using animated bitmaps and moving them around.
Does anyone have any good advice around increasing this performance?
Without seeing your code it's hard to know exactly where the bottleneck is. But here are a few places to start looking (starting with the more trivial fixes):
Make sure you are using a modern browser. In the very least, check across a few other browsers/platforms to see if that has any significant change in performance. From what I understand, EaselJS performance is significantly worse on non-hardware accelerated canvas implementations.
If you can, use createJS's version of TweenJS over other tweening libraries. TweenJS will tie itself to EaselJS's Ticker class, which is more efficient.
Do not call stage.update() unless absolutely necessary. Since stage.update() is such an expensive call, you should be as stingy as possible. In fact, you shouldn't really call it at all if you are using the Ticker to regularly update the stage.
Cache wisely and aggressively. If you have complex static elements on the stage, caching them will save some cycles. However, there is an overhead to caching so save it for containers with a lot of static elements or complexly drawn shapes.
Lower the frequency that EaselJS checks for mouseovers. If you have enabled mouse over on the stage, pass in a lower frequency (documentation). If you don't need it (if you are only listening to clicks), don't enable it at all. Monitoring mouse overs is pretty dang expensive, especially if you have plenty of elements on stage.
Set stage.snapToPixelsEnabled to true. This may or may not help. Theoretically, rendering bitmaps on whole pixels is much more efficient, however this may cause some animations to become jagged and I haven't played around with it enough to know what the other pros and cons are.
I was able to get decent performance with around 600-800 spritesheets at 30FPS and basic tweening using Chrome on a 4 year old iMac (just a quick test).
try using several Stage objects at the same time.

Storing Large 2D Game Worlds

I've been experimenting with different ideas of how to store a 2D game world. I'm interested in hearing techniques of storing large quantities of objects while managing the set that's visible ( lets say 100,000 tiles square ). Obviously the techniques can vary based on how the game renders that space.
Lets assume that we're describing a scrolling 2d game world rather than screen based as you could fairly easily do screen based rendering from such a setup while the converse is a bit more messy.
Looking for language agnostic solutions here so it's more helpful to others.
Edit: I think a good answer here would be a general review of the ideas to consider when thinking about this, as some of the responders have attempted, but also begin to explain how different solutions would apply to those scenarios. It's a somewhat complex question, so I would expect a good answer to reflect that.
Quadtrees are a fairly efficient solution for storing data about a large 2-dimensional world and the objects within it.
You might get some ideas on how to implement this from some spatial data structures like range or kd trees.
However, the answer to this question would vary considerably depending exactly on how your game works.
Are we talking a 2D platformer with 10 enemies onscreen, 20 more offscreen but "active", and an unknown number more "inactive"? If so, you can probably store your whole level as an array of "screens" where you manipulate the ones closest to you.
Or do you mean a true 2D game with lots of up/down movement too? You might have to be a bit more careful here.
The platform is also of some importance. If you're implementing a simple platformer for desktop PCs, you probably wouldn't have to worry about performance as much as you would on an embedded device. This is no excuse to be naive about it, but you might not have to be terribly clever either.
This is a somewhat interesting question I think. Presumably someone smarter than I who has experience with implementing platformers has thought these things out already.
Break the world into smaller areas, and deal with them. Any solution to this problem is going to boil down to this concept (such as quadtrees, mentioned in another answer). The differences will be in how they subdivide the world.
How much data is stored per tile? How fast can players move across the world? What's the behavior of NPCs, etc., that are offscreen? Do they just reset when the player comes back (like old Zelda games)? Do they simply resume where they were? Do they do some kind of catch-up script?
How much different rendering data is going to be needed for different areas?
How much of the world can be seen at one time?
All of these questions are going to immpact your solution, as well as the capabilities of your platform. Coming up with a general answer for these without having a reasonable idea of these parameters is going to be a bit difficult.
Assuming that your game will only update what is visible and some area around what is visible, just break the world in "screens" (a "screen" is a rectangular area on the tilemap that can fill the whole screen). Keep in memory the "screens" around the visible area (and some more if you want to update entities which are close to the character - but there is little reason to update an entity that far away) and have the rest on disk with a cache to avoid loading/unloading of commonly visited areas when you move around. Some setup like:
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|VVV|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|NNN|NNN|NNN|NNN|NNN|FFF|
+---+---+---+---+---+---+---+
|FFF|FFF|FFF|FFF|FFF|FFF|FFF|
+---+---+---+---+---+---+---+
Where "V" part is the "screen" where the center (hero or whatever) is, the "N" parts are those who are nearby and have active (updating) entities, are checked for collisions, etc and "F" parts are far parts which might get updated infrequently and are prone to be "swapped" out (stored to disk). Of course you might want to use more "N" screens than two :-).
Note btw that since 2D games do not usually hold much data instead of saving the far away parts to disk you might want to just keep them in memory compressed.
You probably want to use a single int or byte array that links to block types. If you need more optimization from there, then you'll want to link to more complicated data structures like oct trees from your array. There is a good discussion on a Java game forum here: http://www.javagaming.org/index.php/topic,20505.30.html text
Anything with links becomes very expensive because the pointer takes up something like 8 bytes each, depending upon the language, so depending upon how populated your world is it can get expensive very quickly (8 pointers 8 bytes each is 64 bytes per item, and a byte array is 1 byte per item). So unless 1/64 of your world is empty, a byte array is going to be a much better option. You're also going to need to spend a lot of time iterating down the tree whenever you're doing a lookup for collision or whatever else - a byte array will be an instantaneous lookup.
Hopefully that's detailed enough for you. :-)

Resources