I am developing a hyperbolic graph for visualizing trees with a large amount of nodes. That's why I am using WebGL, and the ThreeJS library, in order to enhance performance. You can check what I have developed until now here: http://hyperbrowser.herokuapp.com/. The idea is to be able to interact with the graph: clicking a node centers the graph in this node, and dragging and dropping the mouse moves the graph around.
I have been able to display up to 100.000 nodes. But when you drag and drop with such a big amount of nodes, performance drops down. I think that is because now I am doing all the operations with JavaScript itself, and then update the vertices position of my THREE.PointCloud.
After making some research I came up with the idea of performing the operations in the Vertex Shader, directly to the vertices itself. And passing the parameters for the specific transformations in either uniforms or attributes. I think this seems viable, thus I want to ask:
Whether this is the proper approach or not
And in case it is, since the transformations I am applying are functions with complex numbers, is there a way to perform this operations in the shaders itself?
The way is just making the math transformations: https://github.com/julesb/glsl-util
All the code is in https://github.com/vabada/hyperBrowser/ in case you want to see how I am performing any specific operations. Of course any tips, ideas and advice are more than welcome.
So far, I've managed to develop the same solution performing all the operations in the vertex shader. Performance rates are similar, so I'll probably go back to performing the operations with JavaScript. However, it was fun to experiment with shaders, so here is the code for the vertex shader, in case it helps someone.
So, first of all the implemented functions to work with complex numbers (thanks to julesb):
Define complex operations
#define product(a, b) vec2(a.x*b.x-a.y*b.y, a.x*b.y+a.y*b.x)
#define conjugate(a) vec2(a.x,-a.y)
#define divide(a, b) vec2(((a.x*b.x+a.y*b.y)/(b.x*b.x+b.y*b.y)),((a.y*b.x-a.x*b.y)/(b.x*b.x+b.y*b.y)))
And then the performed transformation in the vertex shader:
uniform vec2 t;
void main(){
vec2 z = vec2(position.x,position.y);
vec2 newPos = divide((z+t),(vec2(1.0,0) + product((conjugate(t)),z)));
gl_Position = projectionMatrix * modelViewMatrix * vec4(newPos, 0, 1.0);
}
Related
I have 2 questions about the glsl efficiency.
In the fully user-defined shader pipeline
vs -> tcs -> tes -> gs -> fs
the first 4 stages are able to be used for the operation like this:
gl_Position = MPV_matrices * vec4(in_pos, 1);
Which stage is more efficient for this? Is it hardware or version dependent?
Many tutorials about using GLSL are showing examples which are passing a vertex position between the shaders instead of using in-built variable gl_Position only.
Does it make sense in terms of efficiency?
Thank you!
Such Transforms are commonly used in VS
That is because geometry and teselation is not usually used for basic shaders. And in Fragment it would mean that you need to multiply on per fragment basis which is much much more occurent than in per vertex hence performance drops... So people are used to place such transforms into VS and do not think about it too much.
custom input/output variables
We sometimes need vertexes in more than one coordinate system and it is usually faster to use inbuild interpolators than transform on per fragment basis.
For example I sometimes need 3 coordinate systems at once (screen, world, TBN) for proper computations in FS.
Another thing to consider is accuracy see:
How to correctly linearize depth in OpenGL
ray and ellipsoid intersection accuracy improvement
I am developing an OpenGLES2.0 application.Time to render the application is crucial thing for me.My application has both static and dynamic objects(moving pointers in dial).To improve the performance I want to reduce the computation burden on CPU. I want to compute the transformation matrices(translate,rotate,scaling) in the shader itself instead of computing them on CPU (only for dynamic objects).Is it advisable to do like this to improve the performance?
My vertex shader will look this
vertex .c
int main()
{
if(static==0) //for static rendering. Value sent from the app
{
gl_Position = mvp* vec4(vert,1.0); // MVP is calculated in the app and sent to shader
}
if(static==1) // for dynamic objects. Value sent from the app
{
/* compute translated matrix
/* Compute rotated matrix
/* compute scaled matrix
gl_Postion = mvp*vec4(vert,1.0); /* mvp is calculated in the shader
}
}
calculation burden will be on GPU instead of CPU only in the case of dynamic objects.Will this improve time to render an application? I need a better FPS.
Generally speaking, there is no precise answer to your question becouse it really depends on many factors. Particle systems for example tend to be very expensive becouse many calculations are often done inside a shader. So if you wanted to produce hundreds of particles it could be better to compute the mvp matrix for your dynamic objects on the CPU in order not to consume the GPU even more.
Basically, you should find a balance between your app processing on the CPU and OpenGL ES processing on the GPU and should consider what else would you like to render in your app in the future, how many etc...
There are however some brilliant tips that could help you optimize your shaders:
Best practices for shaders
The interesting part for you begins from "Respect the Hardware Limits on Shaders" down to the bottom. A few things that could be improved in your shader:
Use constants instead of immediate values in your shaders especially if you use the same immediate value a few times - the code might be less readable but surely more efficient. For example: create a const above your main()
:
const float ONE = 1.0;
and use it instead of direct 1.0 values.
Branches and conditions are inefficient so it might be better to split your shader into 2 separate shaders:
For static rendering:
void main()
{
gl_Position = mvp* vec4(vert,1.0);
}
For dynamic rendering:
void main()
{
/* compute translated matrix
/* Compute rotated matrix
/* compute scaled matrix
gl_Position = mvp* vec4(vert,1.0);
}
When specifying a value that does not vary over each vertex to my vertex shader, I have the option of either specifying it as a uniform or as a constant vertex attribute (using glVertexAttrib1f and friends).
What are the reasons for which I should choose one over the other? Simply that there are a limited number of available vertex attributes and uniforms on any given implementation and thus I need to choose wisely, or are there also performance implications?
I've done some looking around and found a few discussions, but nothing that answers my concerns concretely:
- http://www.khronos.org/message_boards/showthread.php/7134-Difference-between-uniform-and-constant-vertex-attribute
https://gamedev.stackexchange.com/questions/44024/what-is-the-difference-between-constant-vertex-attributes-and-uniforms
I'm by no means an OpenGL guru, so my apologies if I'm simply missing something fundamental.
Well, vertex attributes can be setup to vary per-vertex if you pass a vertex attribute pointer; you can swap between a constant value and varying per-vertex on the fly simply by changing how you give data to a particular generic attribute location.
Uniforms can never vary per-vertex, they are more constant by far. Generally GLSL ES guarantees you far fewer vertex attribute slots (8, with up to 4 components each) to work with than uniform components (128 vectors, 4 components each) - most implementations exceed these requirements, but the trend is the same (more uniforms than attributes).
Furthermore, uniforms are a per-program state. These are constants that can be accessed from any stage of your GLSL program. In OpenGL ES 2.0 this means Vertex / Fragment shader, but in desktop GL this means Vertex, Fragment, Geometry, Tessellation Control, Tessellation Evaluation.
So I'm working with a 2D skeletal animation system.
There are X number of bones, each bone has at least 1 part (a quad, two triangles). On average, I have maybe 20 bones, and 30 parts. Most bones depend on a parent, the bones will move every frame. There are up to 1000 frames in total per animation, and I'm using about 50 animations. A total of around 50,000 frames loaded in memory at any one time. The parts differ between instances of the skeleton.
The first approach I took was to calculate the position/rotation of each bone, and build up a vertex array, which consisted of this, for each part:
[x1,y1,u1,v1],[x2,y2,u2,v2],[x3,y3,u3,v3],[x4,y4,u4,v4]
And pass this through to glDrawElements each frame.
Which looks fine, covers all scenarios that I need, doesn't use much memory, but performs like a dog. On an iPod 4, could get maybe 15fps with 10 of these skeletons being rendered.
I worked out that most of the performance was being eaten up by copying so much vertex data each frame. I decided to go to another extreme, and "pre-calculated" the animations, building up a vertex buffer at the start for each character, that contained the xyuv coordinates for every frame, for every part, in a single character. Then, I calculate the index of the frame that should be used for a particular time, and calculate a delta value, which is passed through to the shader used to interpolate between the current and the next frames XY positions.
The vertices looked like this, per frame
[--------------------- Frame 1 ---------------------],[------- Frame 2 ------]
[x1,y1,u1,v1,boneIndex],[x2, ...],[x3, ...],[x4, ...],[x1, ...][x2, ...][....]
The vertex shader looks like this:
attribute vec4 a_position;
attribute vec4 a_nextPosition;
attribute vec2 a_texCoords;
attribute float a_boneIndex;
uniform mat4 u_projectionViewMatrix;
uniform float u_boneAlpha[255];
varying vec2 v_texCoords;
void main() {
float alpha = u_boneAlpha[int(a_boneIndex)];
vec4 position = mix(a_position, a_nextPosition, alpha);
gl_Position = u_projectionViewMatrix * position;
v_texCoords = a_texCoords;
}
Now, performance is great, with 10 of these on screen, it sits comfortably at 50fps. But now, it uses a metric ton of memory. I've optimized that by losing some precision on xyuv, which are now ushorts.
There's also the problem that the bone-dependencies are lost. If there are two bones, a parent and child, and the child has a keyframe at 0s and 2s, the parent has a keyframe at 0s, 0.5s, 1.5s, 2s, then the child won't be changed between 0.5s and 1.5s as it should.
I came up with a solution to fix this bone problem -- by forcing the child to have keyframes at the same points as the parents. But this uses even more memory, and basically kills the point of the bone hierarchy.
This is where I'm at now. I'm trying to find a balance between performance and memory usage. I know there is a lot of redundant information here (UV coordinates are identical for all the frames of a particular part, so repeated ~30 times). And a new buffer has to be created for every set of parts (which have unique XYUV coordinates -- positions change because different parts are different sizes)
Right now I'm going to try setting up one vertex array per character, which has the xyuv for all parts, and calculating the matrices for each parts, and repositioning them in the shader. I know this will work, but I'm worried that the performance won't be any better than just uploading the XYUV's for each frame that I was doing at the start.
Is there a better way to do this without losing the performance I've gained?
Are there any wild ideas I could try?
The better way to do this is to transform your 30 parts on the fly, not make thousands of copies of your parts in different positions. Your vertex buffer will contain one copy of your vertex data, saving tons of memory. Then each frame can be represented by a set of transformations passed as a uniform to your vertex shader for each bone you draw with a call to glDrawElements(). Each dependent bone's transformation is built relative to the parent bone. Then, depending on where on the continuum between hand crafted and procedurally generated you want your animations, your sets of transforms can take more or less space and CPU computing time.
Jason L. McKesson's free book, Learning Modern 3D Graphics Programming, gives a good explanation on how to accomplish this in chapter 6. The example program at the end of this chapter shows how to use a matrix stack to implement a hierarchical model. I have an OpenGL ES 2.0 on iOS port of this program available.
Like many 3d graphical programs, I have a bunch of objects that have their own model coordinates (from -1 to 1 in x, y, and z axis). Then, I have a matrix that takes it from model coordinates to world coordinates (using the location, rotation, and scale of the object being drawn). Finally, I have a second matrix to turn those world coordinates into canonical coordinates that OopenGL ES 2.0 will use to draw to the screen.
So, because one object can contain many vertices, all of which use the same transform into both world space, and canonical coordinates, it's faster to calculate the product of those two matrices once, and put each vertex through the resulting matrix, rather than putting each vertex through both matrices.
But, as far as I can tell, there doesn't seem to be a way in OpenGL ES 2.0 shaders to have it calculate the matrix once, and keep using it until the one of the two matrices used until glUniformMatrix4fv() (or another function to set a uniform) is called. So it seems like the only way to calculate the matrix once would be to do it on the CPU, and then result to the GPU using a uniform. Otherwise, when something like:
gl_Position = uProjection * uMV * aPosition;
it will calculate it over and over again, which seems like it would waste time.
So, which way is usually considered standard? Or is there a different way that I am completely missing? As far as I could tell, the shader used to implement the OpenGL ES 1.1 pipeline in the OpenGL ES 2.0 Programming Guide only used one matrix, so is that used more?
First, the correct OpenGL term for "canonical coordinates" is clip space.
Second, it should be this:
gl_Position = uProjection * (uMV * aPosition);
What you posted does a matrix/matrix multiply followed by a matrix/vector multiply. This version does 2 matrix/vector multiplies. That's a substantial difference.
You're using shader-based hardware; how you handle matrices is up to you. There is nothing that is "considered standard"; you do what you best need to do.
That being said, unless you are doing lighting in model space, you will often need some intermediary between model space and 4D homogeneous clip-space. This is the space you transform the positions and normals into in order to compute the light direction, dot(N, L), and so forth.
Personally, I wouldn't suggest world space for reasons that I explain thoroughly here. But whether it's world space, camera space, or something else, you will generally have some intermediate space that you need positions to be in. At which point, the above code becomes necessary, and thus there is no time wasted.