Obviously new to OpenGL I was wondering if it was possible to use a VBO with multiple normal vectors per vertice. My current vertex array order looks like:
j = [x,y,z,r,g,b,a,n1x,n1y,n1z,n2x,n2y,n2z,n3x,n3y,n3z....]
This method requires the shaders to distinguish which normal vector to use, which is causing the problem. Any suggestions would be great.
Also looking for tutorials on using multiple IBO's and VBO's, most tutorials only seem to use one.
You might make it work interleaved by supplying different strides to glVertexAttribPointer(), which is how it handles different amounts of data for the vertex and texture coordinates, for example. Or, you could use separate VBOs for the vertex, normal and texture coordinates, rather than interleave them.
You can find better examples of using VBOs in the PowerVR SDK, which is a free download at:
http://www.imgtec.com/powervr/insider/sdkdownloads/
Related
This is for an OpenGL ES 2.0 game on Android, though I suspect the right answer is generic to any opengl situation.
TL;DR - is it better to send N data to the gpu once and then make K draw calls with it; or send K*N data to the gpu once, and make 1 draw call?
More Details I'm wondering about best practices for my situation. I have a dynamic mesh whose vertices I recompute every frame - think of it as a water surface - and I need to project these vertices onto K different quads in my game. (In each case the projection is slightly different; sparing details, you could imagine them as K different mirrors surrounding the mesh.) K is in the order of 10-25; I'm still figuring it out.
I can think of two broad options:
Bind the mesh as is, and call draw K different times, either
changing a uniform for shaders or messing with the fixed function
state to render to the correct quad in place (on the screen) or to different
segments of a texture (which I can later use when rendering the quads to achieve
the same effect).
Duplicate all the vertices in the mesh K times, essentially making a
single vertex stream with K meshes in it, and add an attribute (or
few) indicating which quad each mesh clone is supposed to project
onto (and how to get there), and use vertex shaders to project. I
would make one call to draw, but send K times as much data.
The Question: of those two options, which is generally better performance wise?
(Additionally: is there a better way to do this?
I had considered a third option, where I rendered the mesh details to a texture, and created my K-clone geometry as a sort of dummy stream, which I could bind once and for all, that looked up in a vertex shader into the texture for each vertex to find out what vertex it really represented; but I've been told that texture support in vertex shaders is poor or prohibited in OpenGL ES 2.0 and would prefer to avoid that route.)
There is no perfect answer to this question, though I would suggest you think about the nature of real-time computer graphics and the OpenGL pipeline. Although "the GL" is required to produce results that are consistent with in-order execution, the reality is that GPUs are highly parallel beasts. They employ lots of tricks that work best if you actually have many unrelated tasks going on at the same time (some even split the whole pipeline up into discrete tiles). GDDR memory, for instance is really high latency, so for efficiency GPUs need to be able to schedule other jobs to keep the stream processors (shader units) busy while memory is fetched for a job that is just starting.
If you are recomputing parts of your mesh each frame, then you will almost certainly want to favor more draw calls over massive CPU->GPU data transfers every frame. Saturating the bus with unnecessary data transfers plagues even PCI Express hardware (it is far slower than the overhead that several additional draw calls would ever add), it can only get worse on embedded OpenGL ES systems. Having said that, there is no reason you could not simply do glBufferSubData (...) to stream in only the affected portions of your mesh and continue to draw the entire mesh in a single draw call.
You might get better cache coherency if you split (or partition the data within) the buffer and/or draw calls up, depending on your actual use case scenario. The only way to decisively tell which is going to work better in your case is to profile your software on your target hardware. But all of this fail to look at the bigger picture, which is: "Why am I doing this on the CPU?!"
It sounds like what you really want is simply vertex instancing. If you can re-work your algorithm to work completely in vertex shaders by passing instance IDs you should see a massive improvement over all of the solutions I have seen you propose so far (true instancing is actually somewhere between what you described in solutions 1 and 2) :)
The actual concept of instancing is very simple and will give you benefits whether your particular version of the OpenGL API supports it at the API level or not (you can always implement it manually with vertex attributes and extra vertex buffer data). The thing is, you would not have to duplicate your data at all if you implement instancing correctly. The extra data necessary to identify each individual vertex is static, and you can always change a shader uniform and make an additional draw call (this is probably what you will have to do with OpenGL ES 2.0, since it does not offer glDrawElementsInstanced) without touching any vertex data.
You certainly will not have to duplicate your vertices K*N times, your buffer space complexity would be more like O (K + K*M), where M is the number of new components you had to add to uniquely identify each vertex so that you could calculate everything on the GPU. For "instance," you might need to number each of the vertices in your quad 1-4 and process the vertex differently in your shader depending on which vertex you're processing. In this case, the M coefficient is 1 and it does not change no matter how many instances of your quad you need to dynamically calculate each frame; N would determine the number of draw calls in OpenGL ES 2.0, not the size of your data. None of this additional storage space would be necessary in OpenGL ES 2.0 if it supported gl_VertexID :(
Instancing is the best way to make effective use of the highly-parallel GPU and avoid CPU/GPU synchronization and slow bus transfers. Even though OpenGL ES 2.0 does not support instancing in the API sense, multiple draw calls using the same vertex buffer where the only thing you change between calls are a couple of shader uniforms is often preferable to computing your vertices on the CPU and uploading new vertex data every frame or having your vertex buffer's size depend directly on the number of instances you intend to draw (yuck). You'll have to try it out and see what your hardware likes.
Instancing would be what you are looking for but unfortunately it is not available with OpenGL ES 2.0. I would be in favor of sending all the vertices to the GPU and make one draw call if all your assets can fit into the GPU. I have an experience of reducing draw calls from 100+ to 1 and the performance went from 15 fps to 60 fps.
To minimize count of state changes, I should sort drawing order of meshes. Anyway If I have multiple meshes using multiple shaders, I need to choose sorting by one of vertex attribute bindings or shader uniform parameters.
What should I choose? I think I have to choose minimizing vertex attribute change because of GPU cache hit rate, but I have no idea about shader changing cost. What's the generic concern when deciding drawing order? Can I get some basis to choose this?
PS. I'm targeting iOS/PowerVR SGX chips.
Edit
I decided to go sort-by-material because many meshes will use just a few materials, but there're bunch of meshes to draw. This means I will have more opportunity to share materials than share meshes. So it will have more chance to decrease state change count. Anyway I'm not sure, so if you have better opinion, please let me know.
You don't need to depth sort opaque objects on the PowerVR SGX since uses order-independent, pixel perfect hidden surface removal.
Depth sort only to achieve proper transparency/translucency rendering.
The best practice on SGX is to sort by state, in the following order:
Viewport
Framebuffer
Shader
Textures
Clipping, Blending etc.
Texture state change can be significantly reduced by using texture atlases.
The amount of draw calls can be reduced by batching.
Thats just the golden rules, remember that you should profile first and then optimize :)
See:
http://www.imgtec.com/powervr/insider/docs/PowerVR.Performance%20Recommendations.1.0.28.External.pdf
Use something like this http://realtimecollisiondetection.net/blog/?p=86 and then you can change which parts are sorted first in your code at run time to achieve the best speeds per device
I'm adding an OpenGL renderer to my 2D game engine and I want to know whether there is a way to apply an mvp matrix only to part of the vertices in a single draw call?
I'm planning to group draw calls by textures so I'll pass a buffer of many vertices and texcoords, now I want to apply different rotation angles to different quads. Is there a way to accomplish it in the shader or should I give up on the mvp matrix in the shader and perform the same thing using the cpu?
EDIT: What about adding 3 float attributes (rotation and rot_center.xy) per vertex?
what's better performance
(1) doing CPU rotation?
(2) providing 3 more floats per vertex
(3) separating draw calls?
Is there any other option?
Here is a possibility:
Do the rotation in the vertex shader. Pass in the information (angle?) needed to create the rotation matrix as a vertex attribute.
Pass in a vertex attribute (ubyte) that is effectively a per-vertex boolean flag. Rotation in #1 will be executed only if the bool is set.
Not sure if the above will work for you from a performance/storage perspective.
I think that, while it is a good thing to group draw calls for many different performance reasons, changing your code to satisfy a basic requirement as rotation is not a good idea.
Drawing batching is a good thing but, if you are forced to keep an additional attribute (because you cannot do it with uniforms for sure, you wouldn't have the information of the single entity) it is not worth.
An additional attribute means much more memory bandwidth usage that usually is the main killing factor for performances on nowadays systems.
Drawing batching, on the other side, is important but not always critical, it depends on many factors such as:
the GPU OpenGL driver optimization
The GPU tiles configuration
The number of shapes/draw calls we are talking about (if you have 20 quads on the screen, why should you bother of batching? :) )
In other words, often it is much more convenient to drop extreme batching in favor of easiness/main tenability and avoid fancy solutions for simple requirements as rotation.
I hope this helps in some way.
Use two different objects, that is all!
There is no other workaround for rotation of part of object
Example:
A game with a tank, where you want to rotate turret and remaining-body separately. Like in your case here these two are treated as separate objects.
I have tried for some time now to use the Vertex Buffer Objects to render a texture on the screen. I have a working function here that use the classic method:
https://github.com/batiste/sdl2-opengl-es/blob/master/common.c#L546
This first method works. A bit down there is the Vertex Buffer modified version:
https://github.com/batiste/sdl2-opengl-es/blob/master/common.c#L586
I have tried many different ways, checked all the inputs and search on this site for similar problems but without success. I need a fresh, expert eye on this.
The second part of the question is about performance. I want to use then to display some simple textures on my Android phone. What kind of speed up can I expect from using vertex buffer? Is it really worth using for 2 triangles?
glVertexAttribPointers last parameter is a byte offset into the buffer. So looks like there's a sizeof(GLfloat) factor missing.
I'm new to OpenGL ES 2.0 with it's programmable pipeline and I'm porting application that renders many objects all with different textures.
So this will require calling glDrawArrays for each object and changing textures between calls? Or there's another way to draw multiple objects with different textures with single glDrawArrays call?
I'm asking because I noticed that doing many calls to glDrawArrays is MUCH slower when tried to use them instead of glBegin/glEnd with 'desktop' OpenGL.
I'm rendering map tiles so ALL textures are different, they are dynamically loaded (can't spend much time processing them as if they were loaded once) and also quite large (up to 512x512).
Unfortunately, there is not a simple built-in way to apply multiple textures in a single batch glDrawArrays call. There are, however, ways to make it work. One of the most common strategies is known as a Texture Atlas. Basically, the idea is to combine many images together into one larger texture, with each sub-image occupying a known sub-rectangle of the texture. When you map those onto your primitives, you supply the coordinates of the sub-rectangle corresponding to the image you want to display.
A texture atlas will work in a large number of cases, but can be comparatively complex to set up. If you don't have to do a different texture for every single object, the first thing to try would be to simply batch together as many primitives that use the same texturing as possible.
If you were not using OpenGL ES, you might also look into using Texture Arrays, if your textures are all of the same size.