How to implement batches using webgl? - opengl-es

I am working on a small game using webgl. Within this game I have some kind of forest which consists out of many (100+) tree objects. Because I only have a few different tree models, I rotate and scale these models in a different way before I display them.
At the moment I loop over all trees to display them:
for (var tree in trees) {
tree.display();
}
While the display() method of tree looks like:
display : function() { // tree
this.treeModel.setRotation(this.rotation);
this.treeModel.setScale(this.scale);
this.treeModel.setPosition(this.position);
this.treeModel.display();
}
Many tree objects share the same treeModel object, so I have to set rotation/scale/position of the model everytime before I display it. The rotation/scale/position values are different for every tree.
The display method of treeModel does all the gl stuff:
display : function() { // treeModel
// bind texture
// set uniforms for projection/modelview matrix based on rotation/scale/position
// bind buffers
// drawArrays
}
All tree models use the same shader but can use different textures.
Because a single tree model consists only out of a few triangles I want to combine all trees into one VBO and display the whole forest with one drawArrays() call.
Some assumptions to make talking about numbers easier:
There are 250 trees to display
There are 5 different tree models
Every tree model has 50 triangles
Questions I have:
At the moment I have 5 buffers that are 50 * 3 * 8 (position + normal + texCoord) * floatSize bytes large. When i want to display all trees with one vbo i would have a buffer with 250 * 50 * 3 * 8 * floatSize byte size. I think I can't use an index buffer because I have different position values for every tree (computed out of the position value of the tree model and the tree position/scale/rotation). Is this correct or is there still a way I can use index buffers to reduce the buffer size at least a bit? Maybe there are other ways to optimize this?
How to handle different textures of the tree models? I can bind all textures to different texture units but how can I decide within the shader which texture should be used for the fragment that is currently displayed?
When I want to add a new tree (or any other kind of object) to this buffer at runtime: Do I have to create a new buffer and copy the content? I think new values can't be added by using glMapBuffer. Is this correct?

Index element buffers can only reach over attributes that are equal to or below 65535 in length, so you need to use drawArrays instead. It's usually not a big loss.
You can add trees to the end of the buffers using GL.bufferSubData.
If your textures are in reasonable sized (like 128x128 or 256x256), you can probably merge them into one big texture and handle the whole thing with the UV-coords. If not, you can add another attribute saying what texture the vertex belongs to and have a condition in the vertex shader, alternatively an array of sampler2Ds (not sure it works, never tried it). Remember that conditions in shaders are pretty slow.
If you decide to stick to your current solution, make sure to sort the trees so the once using the same textures are rendered after each other - keeping state switching down is essential, always.

A few thoughts:
Once you plant a tree in your world, do you ever modify it? Will it animate at all? Or is it just static geometry? If it's truly static, you could always build a single buffer with several copies of each tree. As you append trees, first apply (in Javascript) that instance's world transform to the vertices. If using triangle strips, you can link trees together using degenerate polygons.
You could roll your own pseudo-instanced drawing:
Encode an instance ID in the array buffer. Just set this to the same value for all vertices that are part of the same tree instance. I seem to recall that you can't have non-floaty vertex attributes in ES GLSL (maybe that's a Chrome limitation), so you will need to bring it in as a float but use it as an int. Since it's coming in as a float, you will have to deal with the fact that it's interpolated across your triangle, and so the value will have minor fluctuations - but simply rounding to the nearest integer fixes that right up.
Use a separate texture (which I will call the data texture) to encode all the per-instance information. In your vertex shader, look at the instance ID of the current vertex and use that to compute a texture coordinate in the data texture. Pull out whatever you need to transform the current vertex, and apply it. I think this is called a "dependent texture read", which is generally frowned upon because it can cause performance issues, but it might help you batch your geometry, which can help solve performance issues. If you're interested, you'll have to try it and see what happens.
Hope for an extension to support real instanced drawing.

Your current approach isn't so bad. I'd say: Stick with it until you hit some wall.
50 triangles is already a reasonable batch size for a single drawElements/drawArrays call. It's not optimal, but also not so bad. So for every tree change the paramters like location, texture and maybe shape through uniforms. Then do a draw call for each tree. Also a total of 250 drawElements calls isn't so bad either.
So I'd use one single VBO that contains all the used tree geometry variants. I'd actually split up the trees into building blocks, so that I could recombine them for added variety. And for each tree set appropriate offsets into the VBO before calling drawArrays or drawElements.
Also don't forget that you can do a very cheap field of view culling of each tree.

Related

Why Directx11 doesn't support multiple index buffers in IASetIndexBuffer

You can set multiple vertex buffers with IASetVertexBuffers,
but there is no plural version of IASetIndexBuffer.
What is the point of creating (non-interleaved) multiple vertex buffer datas if you wont be able to refer them with individual index buffers
(assume i have a struct called vector3 with 3 floats x,y,z)
let say i have a model of human with 250.000 vertices and 1.000.000 triangles;
i will create a vertex buffer with size of 250.000 * sizeof(vector3)
for vertex LOCATIONS and also,
i will create another vertex buffer with size of 1.000.000 * 3 *
sizeof(vector3) for vertex NORMALS (and propably another for diffuse
texture)
i can set these vertex buffers like:
ID3D11Buffer* vbs[2] = { meshHandle->VertexBuffer_Position, meshHandle->VertexBuffer_Normal };
uint strides[] = { Vector3f_size, Vector3f_size };
uint offsets[] = { 0, 0 };
ImmediateContext->IASetVertexBuffers(0, 2, vbs, strides, offsets);
how can i set seperated index buffers for these vertex datas if IASetIndexBuffer only supports 1 index buffer
and also (i know there are techniques for decals like creating extra triangles from the original model but)
let say i want to render a small texture like a SCAR on this human model's face (let say forehead), and this scar will only spread through 4 triangles,
is it possible creating a uv buffer (with only 4 triangles) and creating 3 different index buffers for locations, normals and UVs for only 4 triangles but using the same original vertex buffers (same data from full human model). i dont want to create tonnes of uv data which will never be rendered beside characters forehead (and i dont want to re-use, re-create vertex position datas for this secondary texture layers (decals))
EDIT:
i realized i didnt properly ask a question, so my question is:
did i misunderstand non-interleaved model structure (is it being used
for some other reason instead having non-aligned vertex components)?
or am i approaching non-interleaved structure wrong (is there a way
defining multiple non-aligned vertex buffers and drawing them with
only one index buffer)?
The reason you can't have more than on Index buffer bound at a time is because you need to specify a Fixed number of primitives when you make a call to DrawIndexed.
So in your example, if you have one index buffer with 3000 primitives and another with 12000 primitives, the pipeline would have no idea how to match the first set to the second set.
If is generally normal that some vertex data (mostly position) eventually requires to be duplicated across your vertex buffer, since your buffers requires to be the same size.
Index buffer works as a "lookup table", so your data across vertex buffers need to be consistent.
Non interleaved model structure has many advantages:
First using separate buffers can lead to better performances if you need to draw the models several times and some draws do not require attributes.
For example, when you render a shadow map, you will need to access only positions, so in interleaved mode, you still need to bind a large data structure and access elements in a non contiguous way (Input Assembler does that). In case of non interleaved data, Position will be contiguous in memory so fetch will be much faster.
Also non interleaved allows to more easily do some processing on some attributes, it is common nowadays to perform some displacement or skinning in a compute shader, so in that case you can also easily create another Position+Normal buffer, perform your skinning on those and attach them in the pipeline once processed (And you can keep UV buffer intact).
If you want to draw non aligned Vertex buffers, you could use Structured Buffers instead (and use SV_VertexID and some custom lookup tables in your shader code).

Efficiently retrieve Z ordered objects in viewport in 2D game

Imagine a 2D game with a large play area, say 10000x10000 pixels. Now imagine there are thousands of objects spread across this area. All the objects are in a Z order list, so every object has a well-defined position relative to every other object even if they are far away.
Suppose there is a viewport in to this play area showing say a 500x500 area of this play area. Obviously if the algorithm is "for each object in Z order list, if inside viewport, render it" then you waste a lot of time iterating all the thousands of objects far outside the viewport. A better way would be to maintain a Z-ordered list of objects near or inside the viewport.
If both the objects and the viewport are moving, what is an efficient way of maintaining a Z-ordered list of objects which are candidates to draw? This is for a general-purpose game engine so there are not many other assumptions or details you can add in to take advantage of: the problem is pretty much just that.
You do not need to keep your memory layout strongly ordered by Z. Instead you need to store your objects in a space partitionning structure that is oriented along the viewing surface.
A typical partitionning structure, is a quad-tree, in 2D. You can use a binary tree, you can use a grid, or you can use a spatial hashing scheme. You can even mix those techniques and combine them one into each other.
There is no "best", but you can put in the balance the ease of writing and maintaining the code. And the memory you have available.
Let us consider the grid, it is the most simple to implement, fastest to access, and easiest to traverse. (traversing is the fact of going to neighborhood cells)
Imagine you allow yourself 20MB of RAM usage for your grid skeleton, considering the cell content is just a small object (like a std::vector or a c# List), say 50 bytes. for a 10k pixels square surface you then have:
sqrt(20*1024*1024 / 50) = 647
you get 647 cells for one dimension, therefore 10k/647 = 15 pixels wide cells.
Still very small, so I suppose perfectly acceptable. You can adjust the numbers to get cells of 512 pixels for example. It should be a good fit when a few cells fit in the viewport.
Then, it is trivially easy to determine which cells are activated by the viewport, by dividing the top left corner by the size of the cell and flooring that result, this gives you the index directly in the cell. (provided both your viewport space and grid space start at 0,0 both. otherwise you need to offset)
Finally take the bottom right corner, determine the grid coordinate for the cell; and you can do a dual loop (x and y) between the min and max to iterate over the activated cells.
When treating a cell, you can draw the objects it contains by going through the list of objects that you would have previously stowed.
Beware of objects that spans over 2 cells or more. You need to make a choice, either store them once and only, but then your search algorithms will always need to know the size of the biggest element in the region and also search the lists of the neighbooring cells (by going as far as necessary to be sure to cover at least the size of this biggest element).
Or, you can store it multiple times (my prefered way), and simply make sure when you iterate cells, that you treat objects once only per frame. This is very easily achieved by using a frame id, in the object structure (as a mutable member).
This same logic applies for more flexible parition like binary trees.
I have implementation for both available in my engine, check the code out, it may help you get through the details: http://sourceforge.net/projects/carnage-engine/
Final words about your Z Ordering, if you had multiple memory storage for each Z, then you already did a space partitionning, simply not along the good axis.
This can be called layering.
What you can do as an optimization, is instead of storing lists of objects in your cells, you can store (ordered) maps of objects and their keys is their Z, therefore the iteration will be ordered along Z.
A typical solution to this sort of problem is to group your objects according to their approximate XY location. For example, you can bucket them into 500x500 regions, i.e. objects intersecting [0,500]x[0,500], objects intersecting [500,1000]x[0,500], etc. Very large objects might be listed in multiple buckets, but presumably there are not too many very large objects.
For each viewport, you would need to check at most 4 buckets for objects to render. You will generally only look at about as many objects as you need to render anyway, so it should be efficient. This does require a bit more work updating when you reposition objects. However, assuming that a typical object is only in one bucket, it should still be pretty fast.

Is it okay to send an array of objects to vertex shader?

Due to performance issues drawing thousands of similar triangles (with different attributes), I would like to draw all of these using a single call to drawElements. But in order to draw each triangle with their respective attributes (ex: world location, color, orientation) I believe i need to send an array buffer to my vertex shader, where the array is a list of all the triangle attributes.
If this approach is indeed the standard way to do it, then i am delighted to know i have the theory correct. I just need to know how to send an array buffer to the shader. Note that currently i know how to send a multiple attributes and uniforms (though they are not contiguous in memory, which is what im looking for).
If not, i'd appreciate it if a resident expert can point me in the right direction.
I have a related question because I am having trouble actually implementing a VBO.
How to include model matrix to a VBO?
You do have the theory correct, let me just clear a few things...
You can only "send vertex array" to the vertex shader via pointers using attributes so this part of the code stays pretty much the same. What you do seem to be looking for are 2 optimisations, putting the whole buffer on the GPU and using interleaved vertex data.
To put the whole buffer on the GPU you need to use VBOs as already mentioned in the comment. By doing that you create a raw buffer on the GPU to which you can put any data you want and even modify them in runtime if needed. In your case that would be vertex data. To use non interleaved data you would probably create a buffer for each, position, colour, orientation...
To use interleaved data you need to put them into the buffer sequentially, if possible it is best to create a data structure that holds all the vertex data (of a single vertex, not the whole array) and send those to the buffer (a simple primitive array will work as well). A C example of such structure:
typedef union {
struct {
float x,y,z; //position
float r,g,b,a; //color
float ox,oy,oz; //orientation
};
struct {
float position[3];
float color[4];
float orientation[3];
};
}Vertex;
What you need to do then is set the correct pointers when using this data. In the VBO you start with NULL (0) and that would represent the position in this case, to set colour you would have to then use ((float *)NULL)+3 or use some conveniences such as offsetof(Vertex, color) in C. With that you also need to set the stride, that would be the size of the Vertex structure so you could use sizeof(Vertex) or hardcoded sizeof(float)*(3+4+3).
After this all you need to watch for is correct buffer binding/unbinding while rest of your code should be exactly the same.

Performance of using a large number of small VBOs

I wanted to know whether OpenGL VBOs are meant to be used only for large geometry arrays, or whether it makes sense to use them even for small arrays. I have code that positions a variety of geometry relative to other geometry, but some of the "leaf" geometry objects are quite small, 10x10 quad spheres and the like (200 triangles apiece). I may have a very large number of these small leaf objects. I want to be able to have a different transform matrix for each of these small leaf objects. It seems I have 2 options:
Use a separate VBO for each leaf object. I may end up with a large number of VBOs.
I could store data for multiple objects in one VBO, and apply the appropriate transforms myself when changing the data for a given leaf object. This seems strange because part of the point of OpenGL is to efficiently do large numbers of matrix ops in hardware. I would be doing in software something OpenGL is designed to do very well in hardware.
Is having a large number of VBOs inefficient, or should I just go ahead and choose option 1? Are there any better ways to deal with the situation of having lots of small objects? Should I instead be using vertex arrays or anything like that?
The most optimal solution if your data is static and consists of one object would be to use display lists and one VBO for only one mesh. Otherwise the general rule is that you want to avoid doing anything other than rendering in the main draw loop.
If you'll never need to add or remove an object after initialization (or morph any object) then it's probably more efficient to bind a single buffer permanently and change the stride/offset values to render different objects.
If you've only got a base set of geometry that will remain static, use a single VBO for that and separate VBOs for the geometry that can be added/removed/morphed.
If you can drop objects in or remove objects at will, each object should have it's own VBO to make memory management much simpler.
some good info is located at: http://www.opengl.org/wiki/Vertex_Specification_Best_Practices
I think that 200 triangles per one mesh is not so small number and maybe the performance with VBO for each of that meshes will not decrease so much. Unfortunately it is depended on hardware spec.
One huge buffer will no gain huge performance difference... I think that the best option would be to store several (but not all) objects per one VBO.
renderin using one buffer:
there is no problem with that... you simply have one buffer that is bound and then you can use glDrawArrays with different parameters. For instance if one mesh consists of 100 verts, and in th buffer you have 10 of those meshes you can use
glDrawArrays(triangles, 0, 100);
glDrawArrays(triangles, 100, 100);
glDrawArrays(triangles, ..., 100);
glDrawArrays(triangles, 900, 100);
in that way you minimize changing buffers and still you are able to render it quite efficient.
Are those "small" objects the same? do they have the same geometry but different transformations/materials? Because maybe it is worth using "instancing"?

Moving object Opengl Es 2.0

I am a bit confused about that I need to move my basic square .Should i use my translate matrix or just change the object vertexes. Which one is accurate ?.
I use vertex shader
gl_Position = myPMVMatrix * a_vertex;
and also i use VBO
From an accuracy point of view both methods are about equally good.
From a performance point of view, it's about minimizing bottlenecks:
For a single square you are probably not able to measure any differences, but when you think about 1 million squares (or triangles), thinks get a little more complicated:
If all of your triangles change position relative to each other, you are probably better off with changing the vbo, because you can push the data directly to the graphics card's memory, instead of having a million OpenGl calls (which are very slow).
If all your triangles stay at the same position relative to each other (like it is the case in a normal 3d-model) you should just change the transformation matrix. In this case you don't have to push the data again onto the gfx-memory, and you only have one function-call, and you are transfering only a few bytes of data to the gfx-memory.
Depending on your application it may be a good choice to devide your triangles into different categories and update them apropriately.
Don't move objects by changing all of the vertices! What about a complex model with thousands of vertices? Even if it's a simple square, don't evolve such bad practice. That's exactly what transformation matrices are for. You are already using a transformation matrix in your shader code. From the naming I assume it's a premultiplied model-view-projection matrix. So it consists of the model matrix positioning the object in world space (here's where your translation usually should go into), the view matrix positioning the world in eye/camera space (sometimes model and view matrix are combined into a single modelview matrix, like in fixed function GL) and the projection matrix doing any kind of perspective projection and/or transformation to the clipping volume, all three multiplied together as P * V * M. If there are still some questions on these transformation matrices and their use, consult some literature on 3d transformations or just your favourite OpenGL tutorial.

Resources