DrawPrimitives performance - performance

I want to draw single faces instead of xna models because it's too slow.
But I don't know what the difference is between
DrawPrimitives
DrawUserPrimitives
DrawIndexedPrimitives
DrawUserIndexedPrimitives
Which one is the fastest method? And what are the indices good for?

The simple answer to your question is that the "User" versions are a fair bit slower on the CPU because they have to transfer vertex data to the GPU (via the driver and the bus) each time they are called.
The non-User versions use vertex and index buffers that already exist on the GPU (you put them there at load time). They have considerably less data to transfer, so they are faster.
The "User" and "Indexed" versions will also each have a performance impact on the GPU. This impact is relatively tiny. Generally speaking you don't need to worry about it.
The User versions exist because they are faster when your data changes each frame. There is also DynamicVertexBuffer which can be used with the non-User version of the draw functions. I believe it is slightly faster than the User methods in cases where you can pre-allocate the buffer at the desired size.
The Indexed versions allow you to select vertices out of your vertex buffer using an index buffer (so triangles that you draw can choose vertices at any position in the vertex buffer). The alternative is that your vertex buffer is simply interpreted as as sequential list of triangle vertices (based on PrimitiveType). The main reason for the existence of index buffers is to remove the need for duplicate vertices in your vertex buffer (which would require additional memory and processing on the GPU).
BUT...
XNA's Model class internally uses DrawIndexedPrimitives. Not only that, but it uses it correctly (ie: it doesn't draw single faces - but as many as it can at once - for the best performance). So if you are finding that it is slow, then your problem lies elsewhere.
I suggest trying to diagnose the reason why your game is performing poorly, before trying to select a "solution". Maybe ask for help doing that in a question here (or on https://gamedev.stackexchange.com/).

All in one time if you can , Instancied draw will be always better , but that need you to give all the textures in one time ! In my case , for example , I like to draw instancied objects with 1 only texture ... all the trees , all the ground , all buildings , etc ...

Related

drawElements vs drawArrays in webgl

Does it make any sense to use drawElements instead of drawArrays if I'm not going to share any vertices?
If I understood correctly, with drawElements I have to use multiple drawcalls if the index array exceeds ~65k elements because of the uInt16 limitation of the indexbuffer ( in webgl ).
So could one say as a rule of thumb:
No shared vertices, use drawArrays because its just one big drawCall.
If shared vertices, use drawElements because GPU bandwith could be saved and this would result in better performance than drawArrays also if there are multiple drawcalls required?
The advantage of glDrawElements() is in theory simple: you use fewer vertexes, and you get to use the post-T&L cache. The penalty is that you have to use an index array and cache locality might be worse.
Comparisons are complicated by the fact that the post-T&L cache will vary in size depending on (for example) how old your graphics card is. Newer cards might have huge caches, older cards might have smaller ones.
However, if you are not sharing any vertexes between primitives, then there is no advantage to glDrawElements() whatsoever, and you should use glDrawArrays().
In more complicated scenarios, when you are comparing the cost of additional draw calls (with glDrawElements()) against the cost of extra vertex data and more vertex shader invocations (with glDrawArrays()), I would want to profile.

When to store quaternion vs matrix in static and dynamic objects (data structure design)

My question is about design and possible suggestions for the following scenario:
I am writing a 3d visualizer. For my renderable objects I would like to store the minimum data possible (so quaternions are naturally nice for rotation).
At some point I must extract a Matrix for rendering which requires computation and temporary storage on every frame update (even for objects that do not change spatially).
Given that many objects remain static and don't need to be rotated locally would it make sense to store the matrix instead and thereby avoid the computation for each object each frame? Is there any best practice approach to this perhaps from a game engine design point of view?
I am currently a bit torn between storing the two extremes of either position+quaternion or 4x3/4x4 matrix. Looking at openframeworks (not necessarily trying to achieve the same goal as me), they seem to do a hybrid where they store a quaternion AND a matrix (matrix always reflects the quaternion) so its always ready when needed but needs to be updated along with every change to the quaternion.
More compact storage require 3 scalars, so Euler Angels or Exponential Maps (Rodrigues) can be used. Quaternions is good compromise between conversion to matrix speed and compactness.
From design point of view , there is a good rule "make all design decisions as LATE as possible". In your case, just incapsulate (isolate) the rotation (transformation) representation, to be able in the future, to change the physical storage of data in different states (file, memory, rendering and more). Also it enables different platform optimization, keep data in GPU or CPU and more.
Been there.
First: keep in mind the omnipresent struggle of time against space (in computer science processing time against memory requirements)
You said that want to keep minimum information possible at first (space), and next talked about some temporary matrix reflecting the quartenions, which is more of a time worry.
If you accept a tip, I would go for the matrices. They are generally performance wise standard for 3D graphics and it's size becomes easily irrelevant next to the object data itself.
Just to have and idea: in most GPUs transforming an vector for the identity (no change) is actually faster then checking if it needs transformation and then doing nothing.
As for engines, I can't think of one that does not apply the transformations for every vertex every frame. Even if the objects keep in place, they position has to go through projection and view matrices.
(does this answer? Maybe I got you wrong)

OpenGL ES 2.0: glUseProgram vs glUniform performance

Which is faster, a single call to glUseProgram, or sending e.g. 6 or so floats via glUniform (batched or separately), and by approximately how much?
Can you describe in more detail the scenario where you think this affects the performance of the rendering pipeline? They offer completely different functionalities and I don't see why you would care about the performance of glUseProgram vs glUniform.
Now let's analyze what happens when you use this functions to get an idea of their cost.
When you call glUseProgram it changes several OpenGL rendering states because we are going to use new shaders attached to the program object. The specification says that vertex and fragment programs are installed in the processors when you invoke this function. That alone seems costly enough to overshadow the cost of glUniform. Also, when you install new vertex and fragment programs, additional states of the rendering pipeline are changed to accomodate the number of texture units and data layout used by the programs.
glUniform copies data from one location of memory to another to specify the value of an uniform variable. The worst case would be copying matrices which seems less complex than glUseProgram.
But in the end, it all depends of the amount of data you are transferring with glUniform and the underlying implementation of glUseProgram (it could be super optimized by the driver and have a very small cost) and if your engine is smart enough to group the geometry that uses the same program and draw it without changing states.

How do you handle objects moving between quads when using quadtrees?

I'm trying to use quadtrees for collision detection in a game I'm making, but I'm not sure how to handle objects that might be moving between different quads?
The only way I can think of it is by clearing out the whole tree each frame, and then adding everything back in there, but that seems like that can get cpu intensive and not very efficient. Do you check each object every frame to see if it has moved outside the boundry of it's current quad, and if so then remove it and readd it? That again seems like it can be pretty inefficient because you'd be performing collision checks on every moving object every frame.
Also, regarding quadtrees but unrelated to objects moving around in them, how do you handle multiple objects in the same quad? Most sites that I've read about them on say that you should only have one, maybe two, objects in a quad, and if you get more than that then push them down in the tree. What if you had a situation like this? You have three circles and they are all on the edges of the level below them so they can't go any further down, but there is three all in the same level, which people say you shouldn't have.
I don't think it's particularly inefficient to implement your suggestion: check to see if an object has moved outside its quadtree, and if so then remove and re-add it. Any object which moves from one frame to the next will need to have some collision detection performed on it, surely? And the quadtree operations are only performed if it moves quadtrees, and the CPU time spent there are probably overshadowed by the CPU time doing the more precise "Does object A touch object B?" computations. So I don't know that you can do better.
On your 2nd question: I don't know how other people implement quadtrees, but I allow objects to occupy more than one quadtree, precisely for the reason you've given in your diagram (when an object straddles a boundary). So an object has a "current list of quads" instead of a "current quad".
Removing/re-adding can be optimized by moving up the quad tree instead of removing the item from the tree completely and then re-adding, i.e. move to the "parent" quad, and then have the "parent" add it - if it doesn't fit in the "parent", go to the "grandparent", etc.
As for your second concern, you will need some flexibility - if all 3 are on an edge, then you can't lower them - but that should be (pardon the pun) an edge case.

How to subdivide a 2d game world for better collision detection

I'm developing a game which features a sizeable square 2d playing area. The gaming area is tileless with bounded sides (no wrapping around). I am trying to figure out how I can best divide up this world to increase the performance of collision detection. Rather than checking each entity for collision with all other entities I want to only check nearby entities for collision and obstacle avoidance.
I have a few special concerns for this game world...
I want to be able to be able to use a large number of entities in the game world at once. However, a % of entities won't collide with entities of the same type. For example projectiles won't collide with other projectiles.
I want to be able to use a large range of entity sizes. I want there to be a very large size difference between the smallest entities and the largest.
There are very few static or non-moving entities in the game world.
I'm interested in using something similar to what's described in the answer here: Quadtree vs Red-Black tree for a game in C++?
My concern is how well will a tree subdivision of the world be able to handle large size differences in entities? To divide the world up enough for the smaller entities the larger ones will need to occupy a large number of regions and I'm concerned about how that will affect the performance of the system.
My other major concern is how to properly keep the list of occupied areas up to date. Since there's a lot of moving entities, and some very large ones, it seems like dividing the world up will create a significant amount of overhead for keeping track of which entities occupy which regions.
I'm mostly looking for any good algorithms or ideas that will help reduce the number collision detection and obstacle avoidance calculations.
If I were you I'd start off by implementing a simple BSP (binary space partition) tree. Since you are working in 2D, bound box checks are really fast. You basically need three classes: CBspTree, CBspNode and CBspCut (not really needed)
CBspTree has one root node instance of class CBspNode
CBspNode has an instance of CBspCut
CBspCut symbolize how you cut a set in two disjoint sets. This can neatly be solved by introducing polymorphism (e.g. CBspCutX or CBspCutY or some other cutting line). CBspCut also has two CBspNode
The interface towards the divided world will be through the tree class and it can be a really good idea to create one more layer on top of that, in case you would like to replace the BSP solution with e.g. a quad tree. Once you're getting the hang of it. But in my experience, a BSP will do just fine.
There are different strategies of how to store your items in the tree. What I mean by that is that you can choose to have e.g. some kind of container in each node that contains references to the objects occuping that area. This means though (as you are asking yourself) that large items will occupy many leaves, i.e. there will be many references to large objects and very small items will show up at single leaves.
In my experience this doesn't have that large impact. Of course it matters, but you'd have to do some testing to check if it's really an issue or not. You would be able to get around this by simply leaving those items at branched nodes in the tree, i.e. you will not store them on "leaf level". This means you will find those objects quick while traversing down the tree.
When it comes to your first question. If you only are going to use this subdivision for collision testing and nothing else, I suggest that things that can never collide never are inserted into the tree. A missile for example as you say, can't collide with another missile. Which would mean that you dont even have to store the missile in the tree.
However, you might want to use the bsp for other things as well, you didn't specify that but keep that in mind (for picking objects with e.g. the mouse). Otherwise I propose that you store everything in the bsp, and resolve the collision later on. Just ask the bsp of a list of objects in a certain area to get a limited set of possible collision candidates and perform the check after that (assuming objects know what they can collide with, or some other external mechanism).
If you want to speed up things, you also need to take care of merge and split, i.e. when things are removed from the tree, a lot of nodes will become empty or the number of items below some node level will decrease below some merge threshold. Then you want to merge two subtrees into one node containing all items. Splitting happens when you insert items into the world. So when the number of items exceed some splitting threshold you introduce a new cut, which splits the world in two. These merge and split thresholds should be two constants that you can use to tune the efficiency of the tree.
Merge and split are mainly used to keep the tree balanced and to make sure that it works as efficient as it can according to its specifications. This is really what you need to worry about. Moving things from one location and thus updating the tree is imo fast. But when it comes to merging and splitting it might become expensive if you do it too often.
This can be avoided by introducing some kind of lazy merge and split system, i.e. you have some kind of dirty flagging or modify count. Batch up all operations that can be batched, i.e. moving 10 objects and inserting 5 might be one batch. Once that batch of operations is finished, you check if the tree is dirty and then you do the needed merge and/or split operations.
Post some comments if you want me to explain further.
Cheers !
Edit
There are many things that can be optimized in the tree. But as you know, premature optimization is the root to all evil. So start off simple. For example, you might create some generic callback system that you can use while traversing the tree. This way you dont have to query the tree to get a list of objects that matched the bound box "question", instead you can just traverse down the tree and execute that call back each time you hit something. "If this bound box I'm providing intersects you, then execute this callback with these parameters"
You most definitely want to check this list of collision detection resources from gamedev.net out. It's full of resources with game development conventions.
For other than collision detection only, check their entire list of articles and resources.
My concern is how well will a tree
subdivision of the world be able to
handle large size differences in
entities? To divide the world up
enough for the smaller entities the
larger ones will need to occupy a
large number of regions and I'm
concerned about how that will affect
the performance of the system.
Use a quad tree. For objects that exist in multiple areas you have a few options:
Store the object in both branches, all the way down. Everything ends up in leaf nodes but you may end up with a significant number of extra pointers. May be appropriate for static things.
Split the object on the zone border and insert each part in their respective locations. Creates a lot of pain and isn't well defined for a lot of objects.
Store the object at the lowest point in the tree you can. Sets of objects now exist in leaf and non-leaf nodes, but each object has one pointer to it in the tree. Probably best for objects that are going to move.
By the way, the reason you're using a quad tree is because it's really really easy to work with. You don't have any heuristic based creation like you might with some BSP implementations. It's simple and it gets the job done.
My other major concern is how to
properly keep the list of occupied
areas up to date. Since there's a lot
of moving entities, and some very
large ones, it seems like dividing the
world up will create a significant
amount of overhead for keeping track
of which entities occupy which
regions.
There will be overhead to keeping your entities in the correct spots in the tree every time they move, yes, and it can be significant. But the whole point is that you're doing much much less work in your collision code. Even though you're adding some overhead with the tree traversal and update it should be much smaller than the overhead you just removed by using the tree at all.
Obviously depending on the number of objects, size of game world, etc etc the trade off might not be worth it. Usually it turns out to be a win, but it's hard to know without doing it.
There are lots of approaches. I'd recommend settings some specific goals (e.g., x collision tests per second with a ratio of y between smallest to largest entities), and do some prototyping to find the simplest approach that achieves those goals. You might be surprised how little work you have to do to get what you need. (Or it might be a ton of work, depending on your particulars.)
Many acceleration structures (e.g., a good BSP) can take a while to set up and thus are generally inappropriate for rapid animation.
There's a lot of literature out there on this topic, so spend some time searching and researching to come up with a list candidate approaches. Mock them up and profile.
I'd be tempted just to overlay a coarse grid over the play area to form a 2D hash. If the grid is at least the size of the largest entity then you only ever have 9 grid squares to check for collisions and it's a lot simpler than managing quad-trees or arbitrary BSP trees. The overhead of determining which coarse grid square you're in is typically just 2 arithmetic operations and when a change is detected the grid just has to remove one reference/ID/pointer from one square's list and add the same to another square.
Further gains can be had from keeping the projectiles out of the grid/tree/etc lookup system - since you can quickly determine where the projectile would be in the grid, you know which grid squares to query for potential collidees. If you check collisions against the environment for each projectile in turn, there's no need for the other entities to then check for collisions against the projectiles in reverse.

Resources