Why Directx11 doesn't support multiple index buffers in IASetIndexBuffer

Why Directx11 doesn't support multiple index buffers in IASetIndexBuffer - directx-11

You can set multiple vertex buffers with IASetVertexBuffers,
but there is no plural version of IASetIndexBuffer.
What is the point of creating (non-interleaved) multiple vertex buffer datas if you wont be able to refer them with individual index buffers
(assume i have a struct called vector3 with 3 floats x,y,z)
let say i have a model of human with 250.000 vertices and 1.000.000 triangles;
i will create a vertex buffer with size of 250.000 * sizeof(vector3)
for vertex LOCATIONS and also,
i will create another vertex buffer with size of 1.000.000 * 3 *
sizeof(vector3) for vertex NORMALS (and propably another for diffuse
texture)
i can set these vertex buffers like:
ID3D11Buffer* vbs[2] = { meshHandle->VertexBuffer_Position, meshHandle->VertexBuffer_Normal };
uint strides[] = { Vector3f_size, Vector3f_size };
uint offsets[] = { 0, 0 };
ImmediateContext->IASetVertexBuffers(0, 2, vbs, strides, offsets);
how can i set seperated index buffers for these vertex datas if IASetIndexBuffer only supports 1 index buffer
and also (i know there are techniques for decals like creating extra triangles from the original model but)
let say i want to render a small texture like a SCAR on this human model's face (let say forehead), and this scar will only spread through 4 triangles,
is it possible creating a uv buffer (with only 4 triangles) and creating 3 different index buffers for locations, normals and UVs for only 4 triangles but using the same original vertex buffers (same data from full human model). i dont want to create tonnes of uv data which will never be rendered beside characters forehead (and i dont want to re-use, re-create vertex position datas for this secondary texture layers (decals))
EDIT:
i realized i didnt properly ask a question, so my question is:
did i misunderstand non-interleaved model structure (is it being used
for some other reason instead having non-aligned vertex components)?
or am i approaching non-interleaved structure wrong (is there a way
defining multiple non-aligned vertex buffers and drawing them with
only one index buffer)?

The reason you can't have more than on Index buffer bound at a time is because you need to specify a Fixed number of primitives when you make a call to DrawIndexed.
So in your example, if you have one index buffer with 3000 primitives and another with 12000 primitives, the pipeline would have no idea how to match the first set to the second set.
If is generally normal that some vertex data (mostly position) eventually requires to be duplicated across your vertex buffer, since your buffers requires to be the same size.
Index buffer works as a "lookup table", so your data across vertex buffers need to be consistent.
Non interleaved model structure has many advantages:
First using separate buffers can lead to better performances if you need to draw the models several times and some draws do not require attributes.
For example, when you render a shadow map, you will need to access only positions, so in interleaved mode, you still need to bind a large data structure and access elements in a non contiguous way (Input Assembler does that). In case of non interleaved data, Position will be contiguous in memory so fetch will be much faster.
Also non interleaved allows to more easily do some processing on some attributes, it is common nowadays to perform some displacement or skinning in a compute shader, so in that case you can also easily create another Position+Normal buffer, perform your skinning on those and attach them in the pipeline once processed (And you can keep UV buffer intact).
If you want to draw non aligned Vertex buffers, you could use Structured Buffers instead (and use SV_VertexID and some custom lookup tables in your shader code).

Related

How can I properly manage data in modern OpenGL while considering performance?

In modern OpenGL (3.x+), you create buffer objects which contain vertex attributes, such as positions, colors, normals, texture coordinatess, & indices.
These buffers are then assigned to a corresponding vertex array object (VAO) which essentially contains pointers to all of the data as well as the data's format.
There are many tutorials out there for how to create a VAO and how to use it; unfortunately, it isn't clear how VAO's should be used for larger applications or games.
For example, a game might contain many 3D models, and it seems appropriate to separate each model by a different VAO.
On the other hand, a particle system contains many disconnected primitives traveling independent of one another. In this scenario, using a single VAO per system might improve performance in CPU-GPU transfers. However, in this case, the primitives need to be translated differently than one another, so it might seem viable to separate each particle into a very tiny VAO.
Question:
For a large quantity of small data sets (such as a particle system of quads), should all of the data be packed into 1 VAO or divided into many VAO's? What are the performance benefits/drawbacks in each method?
Assumming 1 VAO is used, the only apparent way to translate each independent sub-unit of data is to modify the actual position information and reload it into the GPU. Doing this many times is costly in terms of time performance.
Assuming many VAO's are used, then the GPU must store duplicate formatting information for each VAO. This seems to be costly in terms of space (but I'm not sure if this is necessarily slow).
Side-Note:
Yes, I'm personally interested in managing a particle system. To keep this question more generic, and more useful for others, I am asking about VAO management as a whole. I am curious what management methods are more suitable vs others when considering the type of data being stored and when considering what type of performance is desired (time/space).
VAO creation is described well here:
https://www.opengl.org/wiki/Vertex_Specification

In the case of particles it would be best to use instanced rendering - where you can render all the particles in a single draw call but assign a different position for each one as an attribute. You can update an existing buffer using glSubData. That way you could update the position on the CPU side between frames, and then update the buffer.
In more complex examples you can instance whichever attributes you want to.
The way I call instanced rendering and set it up in my code is as follows:
void CreateInstancedAttrib(unsigned int attribNum,GLuint VAO,GLuint& posVBO,int numInstances){
glBindVertexArray(VAO);
posVBO = CreateVertexArrayBuffer(0, sizeof(vec3),numInstances,GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(attribNum);
glVertexAttribPointer(attribNum, 3, GL_FLOAT, GL_FALSE, sizeof(vec3), 0);
glVertexAttribDivisor(attribNum, 1);
glBindVertexArray(0);
}
Where posVBO is the usual attrib data and the lines following set up the buffer for positions.
When rendering:
void RenderInstancedStaticMesh(const StaticMesh& mesh, MaterialUniforms& uniforms,const vec3* positions){
for (unsigned int meshNum = 0; meshNum < mesh.m_numMeshes; meshNum++){
if (mesh.m_meshData[meshNum]->m_hasTexture){
glBindTexture(GL_TEXTURE_2D, mesh.m_meshData[meshNum]->m_texture);
}
glBindVertexArray(mesh.m_meshData[meshNum]->m_vertexBuffer);
glBindBuffer(GL_ARRAY_BUFFER, mesh.m_meshData[meshNum]->m_instancedDataBuffer);
glBufferSubData(GL_ARRAY_BUFFER,0, sizeof(vec3) * mesh.m_numInstances, positions);
glUniform3fv(uniforms.diffuseUniform, 1, &mesh.m_meshData[meshNum]->m_material.diffuse[0]);
glUniform3fv(uniforms.specularUniform, 1, &mesh.m_meshData[meshNum]->m_material.specular[0]);
glUniform3fv(uniforms.ambientUniform, 1, &mesh.m_meshData[meshNum]->m_material.ambient[0]);
glUniform1f(uniforms.shininessUniform, mesh.m_meshData[meshNum]->m_material.shininess);
glDrawElementsInstanced(GL_TRIANGLES, mesh.m_meshData[meshNum]->m_numFaces * 3,
GL_UNSIGNED_INT, 0,mesh.m_numInstances);
}
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
}
That's a lot to take in but the important lines are DrawElementsInstance and glBufferSubData.
If you do a few googles on both functions I'm sure you will come to understand how instanced rendering works.
Anymore questions please ask

The general rule is, that you want to minimize the amount of draw calls. If you put things into individual VAOs you have to perform a draw call for each VAO. Also switching between VAOs and VBOs comes with a cost either. Don't think of VAOs and VBOs as "model" containsers, but as memory pools, where each VBO / VAO should be used to coalesce data of identical properties.
A particle system is the perfect candidate to put everything into a single VBO/VAO. In the usual case using instanced rendering where the VBO contain information about where to place each particle.

Is it okay to send an array of objects to vertex shader?

Due to performance issues drawing thousands of similar triangles (with different attributes), I would like to draw all of these using a single call to drawElements. But in order to draw each triangle with their respective attributes (ex: world location, color, orientation) I believe i need to send an array buffer to my vertex shader, where the array is a list of all the triangle attributes.
If this approach is indeed the standard way to do it, then i am delighted to know i have the theory correct. I just need to know how to send an array buffer to the shader. Note that currently i know how to send a multiple attributes and uniforms (though they are not contiguous in memory, which is what im looking for).
If not, i'd appreciate it if a resident expert can point me in the right direction.
I have a related question because I am having trouble actually implementing a VBO.
How to include model matrix to a VBO?

You do have the theory correct, let me just clear a few things...
You can only "send vertex array" to the vertex shader via pointers using attributes so this part of the code stays pretty much the same. What you do seem to be looking for are 2 optimisations, putting the whole buffer on the GPU and using interleaved vertex data.
To put the whole buffer on the GPU you need to use VBOs as already mentioned in the comment. By doing that you create a raw buffer on the GPU to which you can put any data you want and even modify them in runtime if needed. In your case that would be vertex data. To use non interleaved data you would probably create a buffer for each, position, colour, orientation...
To use interleaved data you need to put them into the buffer sequentially, if possible it is best to create a data structure that holds all the vertex data (of a single vertex, not the whole array) and send those to the buffer (a simple primitive array will work as well). A C example of such structure:
typedef union {
struct {
float x,y,z; //position
float r,g,b,a; //color
float ox,oy,oz; //orientation
};
struct {
float position[3];
float color[4];
float orientation[3];
};
}Vertex;
What you need to do then is set the correct pointers when using this data. In the VBO you start with NULL (0) and that would represent the position in this case, to set colour you would have to then use ((float *)NULL)+3 or use some conveniences such as offsetof(Vertex, color) in C. With that you also need to set the stride, that would be the size of the Vertex structure so you could use sizeof(Vertex) or hardcoded sizeof(float)*(3+4+3).
After this all you need to watch for is correct buffer binding/unbinding while rest of your code should be exactly the same.

Modify Buffer in OpenGL

I have a modifiable terrain which is stored in a vertex buffer. Because of its large number of vertices, I do not want to upload all vertices again every time the terrain is modified. What I do by now is to split the terrain into smaller chunks so that I only have to recreate the buffer of the area containing the modification of the terrain.
But how can I just add or remove some vertices of an existing buffer?

You can either use glBufferSubData as datenwolf said, or if you are planning on making a lot of modifications and accessing randomly the data, you may want to map the buffer into client memory using glMapBuffer and later unmap it with glUnmapBuffer. (Then, based on the access specifiers you chose, you can edit the data as a C array)

You can change data in an existing buffer using glBufferSubData

How to implement batches using webgl?

I am working on a small game using webgl. Within this game I have some kind of forest which consists out of many (100+) tree objects. Because I only have a few different tree models, I rotate and scale these models in a different way before I display them.
At the moment I loop over all trees to display them:
for (var tree in trees) {
tree.display();
}
While the display() method of tree looks like:
display : function() { // tree
this.treeModel.setRotation(this.rotation);
this.treeModel.setScale(this.scale);
this.treeModel.setPosition(this.position);
this.treeModel.display();
}
Many tree objects share the same treeModel object, so I have to set rotation/scale/position of the model everytime before I display it. The rotation/scale/position values are different for every tree.
The display method of treeModel does all the gl stuff:
display : function() { // treeModel
// bind texture
// set uniforms for projection/modelview matrix based on rotation/scale/position
// bind buffers
// drawArrays
}
All tree models use the same shader but can use different textures.
Because a single tree model consists only out of a few triangles I want to combine all trees into one VBO and display the whole forest with one drawArrays() call.
Some assumptions to make talking about numbers easier:
There are 250 trees to display
There are 5 different tree models
Every tree model has 50 triangles
Questions I have:
At the moment I have 5 buffers that are 50 * 3 * 8 (position + normal + texCoord) * floatSize bytes large. When i want to display all trees with one vbo i would have a buffer with 250 * 50 * 3 * 8 * floatSize byte size. I think I can't use an index buffer because I have different position values for every tree (computed out of the position value of the tree model and the tree position/scale/rotation). Is this correct or is there still a way I can use index buffers to reduce the buffer size at least a bit? Maybe there are other ways to optimize this?
How to handle different textures of the tree models? I can bind all textures to different texture units but how can I decide within the shader which texture should be used for the fragment that is currently displayed?
When I want to add a new tree (or any other kind of object) to this buffer at runtime: Do I have to create a new buffer and copy the content? I think new values can't be added by using glMapBuffer. Is this correct?

Index element buffers can only reach over attributes that are equal to or below 65535 in length, so you need to use drawArrays instead. It's usually not a big loss.
You can add trees to the end of the buffers using GL.bufferSubData.
If your textures are in reasonable sized (like 128x128 or 256x256), you can probably merge them into one big texture and handle the whole thing with the UV-coords. If not, you can add another attribute saying what texture the vertex belongs to and have a condition in the vertex shader, alternatively an array of sampler2Ds (not sure it works, never tried it). Remember that conditions in shaders are pretty slow.
If you decide to stick to your current solution, make sure to sort the trees so the once using the same textures are rendered after each other - keeping state switching down is essential, always.

A few thoughts:
Once you plant a tree in your world, do you ever modify it? Will it animate at all? Or is it just static geometry? If it's truly static, you could always build a single buffer with several copies of each tree. As you append trees, first apply (in Javascript) that instance's world transform to the vertices. If using triangle strips, you can link trees together using degenerate polygons.
You could roll your own pseudo-instanced drawing:
Encode an instance ID in the array buffer. Just set this to the same value for all vertices that are part of the same tree instance. I seem to recall that you can't have non-floaty vertex attributes in ES GLSL (maybe that's a Chrome limitation), so you will need to bring it in as a float but use it as an int. Since it's coming in as a float, you will have to deal with the fact that it's interpolated across your triangle, and so the value will have minor fluctuations - but simply rounding to the nearest integer fixes that right up.
Use a separate texture (which I will call the data texture) to encode all the per-instance information. In your vertex shader, look at the instance ID of the current vertex and use that to compute a texture coordinate in the data texture. Pull out whatever you need to transform the current vertex, and apply it. I think this is called a "dependent texture read", which is generally frowned upon because it can cause performance issues, but it might help you batch your geometry, which can help solve performance issues. If you're interested, you'll have to try it and see what happens.
Hope for an extension to support real instanced drawing.

Your current approach isn't so bad. I'd say: Stick with it until you hit some wall.
50 triangles is already a reasonable batch size for a single drawElements/drawArrays call. It's not optimal, but also not so bad. So for every tree change the paramters like location, texture and maybe shape through uniforms. Then do a draw call for each tree. Also a total of 250 drawElements calls isn't so bad either.
So I'd use one single VBO that contains all the used tree geometry variants. I'd actually split up the trees into building blocks, so that I could recombine them for added variety. And for each tree set appropriate offsets into the VBO before calling drawArrays or drawElements.
Also don't forget that you can do a very cheap field of view culling of each tree.

Performance of using a large number of small VBOs

I wanted to know whether OpenGL VBOs are meant to be used only for large geometry arrays, or whether it makes sense to use them even for small arrays. I have code that positions a variety of geometry relative to other geometry, but some of the "leaf" geometry objects are quite small, 10x10 quad spheres and the like (200 triangles apiece). I may have a very large number of these small leaf objects. I want to be able to have a different transform matrix for each of these small leaf objects. It seems I have 2 options:
Use a separate VBO for each leaf object. I may end up with a large number of VBOs.
I could store data for multiple objects in one VBO, and apply the appropriate transforms myself when changing the data for a given leaf object. This seems strange because part of the point of OpenGL is to efficiently do large numbers of matrix ops in hardware. I would be doing in software something OpenGL is designed to do very well in hardware.
Is having a large number of VBOs inefficient, or should I just go ahead and choose option 1? Are there any better ways to deal with the situation of having lots of small objects? Should I instead be using vertex arrays or anything like that?

The most optimal solution if your data is static and consists of one object would be to use display lists and one VBO for only one mesh. Otherwise the general rule is that you want to avoid doing anything other than rendering in the main draw loop.
If you'll never need to add or remove an object after initialization (or morph any object) then it's probably more efficient to bind a single buffer permanently and change the stride/offset values to render different objects.
If you've only got a base set of geometry that will remain static, use a single VBO for that and separate VBOs for the geometry that can be added/removed/morphed.
If you can drop objects in or remove objects at will, each object should have it's own VBO to make memory management much simpler.

some good info is located at: http://www.opengl.org/wiki/Vertex_Specification_Best_Practices
I think that 200 triangles per one mesh is not so small number and maybe the performance with VBO for each of that meshes will not decrease so much. Unfortunately it is depended on hardware spec.
One huge buffer will no gain huge performance difference... I think that the best option would be to store several (but not all) objects per one VBO.
renderin using one buffer:
there is no problem with that... you simply have one buffer that is bound and then you can use glDrawArrays with different parameters. For instance if one mesh consists of 100 verts, and in th buffer you have 10 of those meshes you can use
glDrawArrays(triangles, 0, 100);
glDrawArrays(triangles, 100, 100);
glDrawArrays(triangles, ..., 100);
glDrawArrays(triangles, 900, 100);
in that way you minimize changing buffers and still you are able to render it quite efficient.
Are those "small" objects the same? do they have the same geometry but different transformations/materials? Because maybe it is worth using "instancing"?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio