Low shader performance on iPad 1st gen - performance

I have my painting application which is written using OpenGL ES 1.0 and some Quartz.
I'm trying to rewrite it using OpenGL ES 2.0 for better performance and new features.
I have written 2 shaders: one renders user's input to texture and second mixes this texture with some other textures according to some rules.
Suddenly I realized that second shader works too long on iPad 1st generation - I have 10-15 fps only. iPad 2 works perfectly with 60+ fps. I was slightly shocked because original app (OpenGL ES 1.0) works fine on both devices. It renders only two polygons (but almost fullscreen).
I've tried some optimizations like changing precision, commented some math operations, hardcoded some textures calls - It helped a little, but I'm still far away from 60 fps. Only when I fully comment call of this shader I've got 60 fps.
Am I missing something? I haven't much experience in OpenGL but i do believe this shader must work fine on both generations of devices, just like original application works. My vertex and fragment shaders are:
===============Vertex Shader===================
uniform mat4 modelViewProjectionMatrix;
attribute vec3 position;
attribute vec2 texCoords;
varying vec2 fTexCoords;
void main()
{
fTexCoords = texCoords;
vec4 postmp = vec4(position.xyz, 1.0);
gl_Position = modelViewProjectionMatrix * postmp;
}
===============Fragment Shader===================
precision highp float;
varying lowp vec4 colorVarying;
varying highp vec2 fTexCoords;
uniform sampler2D texture; // black & white user should paint
uniform sampler2D drawingTexture; // texture with user drawings I rendered earlier
uniform sampler2D paperTexture; // texture of sheet of paper
uniform float currentArea; // which area we should not shadow
uniform float isShadowingOn; // bool - should we shadow some areas of picture
void main()
{
// I pass 1024*1024 texture here but I only need 560*800 so I do some calculations to find real texture coordinates
vec2 convertedTexCoords = vec2(fTexCoords.x * 560.0/1024.0, fTexCoords.y * 800.0/1024.0);
vec4 bgImageColor = texture2D(texture, convertedTexCoords);
float area = bgImageColor.a;
bgImageColor.a = 1.0;
vec4 paperColor = texture2D(paperTexture, convertedTexCoords);
vec4 drawingColor = texture2D(drawingTexture, convertedTexCoords);
// if special area
if ( abs(area - 1.0) < 0.0001) {
// if shadowing ON
if (isShadowingOn == 1.0) {
// if color of original image is black
if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
}
// if color of original image is grey
else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15 && bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){ gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);}
else
{
gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
}
}
// if shadowing is OFF
else {
// if color of original image is black
if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
gl_FragColor = vec4(bgImageColor.rgb, 1.0);
}
// if color of original image is gray
else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15
&& bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);
}
// rest
else {
gl_FragColor = vec4(bgImageColor.rgb, 1.0);
}
}
}
// if area of fragment is equal to current area
else if ( abs(area-currentArea/255.0) < 0.0001 ) {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
}
// if area of fragment is NOT equal to current area
else {
if (isShadowingOn == 1.0) {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
} else {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
}
}
}

Branching is really expensive to do in a shader, as it removes possibilities for the GPU to run the shader in parallel, and you are having a lot of branches in your fragment shader (the one shader that should be as fast as possible anyway). Even worse than that, you are branching based on values computed on the GPU itself which also drastically drains your performance.
You really should try to remove as many branches as possible, rather let the GPU do some "extra work" by eg. not trying to optimize the texture atlas and render everything (if this is possible), this will still be faster than your current version. If this doesn't work, try to split up your shader in multiple smaller shaders which each only does a specific part of your larger shader and branch on the CPU rather than on the GPU (you only need to do this once per draw call and not for every "pixel").

Beyond JustSid's valid point about branching in the shader, I see a few other things wrong here. First, if I just run this fragment shader through Imagination Texhnologies' PVRUniSco Editor (which you really should get, and is part of their free SDK), I see this:
which shows a best-case performance of 42 cycles, worst of 52 for this shader. From a similar case of fragment shader tuning I asked about, I found that an 11-16 cycle fragment shader took 35-68 ms to render on an iPad 1 (15 - 29 FPS). You're going to need to make this a lot tighter to get reasonable render times for it.
To eliminate some of the branches, you might be able to use a step function or play tricks with your alpha channel. I've done this and seen a massive reduction in shader rendering times. I would not pass in the isShadowingOn uniform, but I would split this into two shaders to use in the different cases of this being on and off.
Beyond branching, I can see that you're performing a dependent texture read for bgImageColor, paperColor, and drawingColor as a result of calculating the texture coordinates to fetch within your fragment shader. This is horribly expensive on the tile-based deferred renderer within iOS devices, because it prevents certain optimizations for texture fetching from being used. Instead of calculating this per-fragment, I recommend moving this calculation to the vertex shader and passing in the result as a varying to your fragment shader. Use that varying as the coordinate to fetch your textures and you'll see a massive boost in performance.
There are also smaller things you can do to tweak this. For example,
gl_FragColor = vec4((paperColor.rgb * bgImageColor.rgb - drawingColor.rgb) * 0.4, 1.0);
should be slightly faster than
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);
The editor will live-compile your shader, so you can try out these manipulations in code and see the results in terms of estimated GPU cycles.

Related

Shader - Unexpected behaviour when dividing with a high value

I have this line:
gl_FragColor = vec4(worldPos.x / maxX, worldPos.z / maxZ, 1.0, 1.0);
Where worldPos.x and worldPos.y goes from 0 to 19900. maxX and maxZ are float uniforms. It works as expected when maxX and maxZ are set to 5000.0 (a gradient to white and above 5000 it's all white), but when maxX and maxZ are set to 19900.0 it all turns blue. Why is that and how to get around it? Hardcoding the values doesn't make a difference, i.e:
gl_FragColor = vec4(worldPos.x / 5000.0, worldPos.z / 5000.0, 1.0, 1.0);
works as expected while:
gl_FragColor = vec4(worldPos.x / 19900.0, worldPos.z / 19900.0, 1.0, 1.0);
makes it all blue. This only happens on some devices and not on others.
Update:
Adding highp modifier (as suggested by Michael below) solved it for one device, but when testing on another it didn't make any difference. Then I tried to do the division on the CPU (also suggested by Michael) like this:
in java, before passing it as uniform:
float maxX = 1.0f / 19900.0f;
float maxZ = 1.0f / 19900.0f;
program.setUniformf(maxXUniform, maxX);
program.setUniformf(maxZUniform, maxZ);
in shader:
uniform float maxX;
uniform float maxZ;
...
gl_FragColor = vec4(worldPos.x * maxX, worldPos.z * maxZ, 1.0, 1.0);
...
Final sulotion:
This still didn't cut it. Now the values are too small so when passed in to the shader they turn 0 due to too low float precision. Then I tried to multiply it by 100 before passing it in, and then multiplying it by 0.01 inside the shader.
in java:
float maxX = 100.0f / 19900.0f;
float maxZ = 100.0f / 19900.0f;
program.setUniformf(maxXUniform, maxX);
program.setUniformf(maxZUniform, maxZ);
in shader:
uniform float maxX;
uniform float maxZ;
...
gl_FragColor = vec4(worldPos.x * 0.01 * maxX, worldPos.z * 0.01 * maxZ, 1.0, 1.0);
...
And that solved the problem. Now the highp modifier isn't needed. Maybe it isn't the prettiest sulotion but it's efficient and robust.
I guess you're running OpenGL ES? Well,the floating precision sucks on many,usually quite old, devices.I had similar issues on several occasions when implementing cascaded shadows mapping in shaders for mobile hardware.
Make sure you use highp qualifier for those variables. (note - that might not solve the issue, but is worth to try)
Another possible solution: don't perform the division in the shader. That's a quite heavy operation for many old and weak implementations anyway. Try to avoid division, sqrt(),pow().Run shader profiler and you will be surprised to find out how much those ops are HEAVY! (iOS emulator on Mac has a nice shader profiler) Try to pass the results directly as uniforms.I am not sure that would be a problem in your case,as I can't see any of these variables bound to per-fragment execution.
And if it still doesn't help, then usually there is nothing you can do about that. That's the old hardware/GLSL implementation issue. But I am sure,if you calculate that on CPU and upload the results as uniforms, that should solve the issue.

Difficulty with proper layering in THREE.js scene

I am working on a hex-based game. I am currently trying to add a "fog of war" effect where certain tiles lay under an alpha mask to show that information is unknown. Unfortunately I'm running into some problems achieving the effect that I want. The way I'm implementing the fog is to create a mesh over all the tiles that has no alpha if the tile is "visible" and .7 if it is not. I think adjust the mesh position based on the camera position so it always stays in perspective. This is the effect:
Unfortunately, the first way I tried to do this has an undesired effect at low viewing angles. Because I'm shifting the fog to lay over tiles even as perspective changes, at low angles it will also cover the tops of mountains and trees. See below:
The second thing I tried was implementing a two scene solution from How to change the zOrder of object with Threejs?. I put the fog and the unseen tiles in one scene, and the seen tiles in another, and then rendered the seen tiles on top of the unseen. That solved the darkness problem for far tiles, however it now introduces another problem for near tiles. See below:
I'm a little stumped what to do. I'm fairly new to THREE.js (at least the more advanced parts of the library) so I'm wondering if there's something I'm missing that might work.
For reference, here's my vertex shader for the fog:
varying vec4 vColor;
void main() {
vec3 cRel = cameraPosition - position;
float dx = (20.0 * cRel.x) / cRel.y;
float dz = (20.0 * cRel.z) / cRel.y;
gl_Position = projectionMatrix *
modelViewMatrix *
vec4(
position.x + dx,
position.y,
position.z + dz,
1.0
);
if(color.x == 1.0 && color.y == 1.0 && color.z == 1.0) {
vColor = vec4(0.0, 0.0, 0.0, 0.0);
} else {
vColor = vec4(color, 0.7);
}
}
and my fragment shader:
varying vec4 vColor;
float expGradient(float val, float max) {
return (max + 1.0 / 10.0) * val / (val + 1.0 / 10.0);
}
void main() {
gl_FragColor = vec4(
vColor.x,
vColor.y,
vColor.z,
expGradient(vColor.w, 0.7)
);
}
I'm using the color of (1.0, 1.0, 1.0) to signify that it should be "seen".

How to implement a ShaderToy shader in three.js?

looking for info on how to recreate the ShaderToy parameters iGlobalTime, iChannel etc within threejs. I know that iGlobalTime is the time elapsed since the Shader started, and I think the iChannel stuff is for pulling rgb out of textures, but would appreciate info on how to set these.
edit: have been going through all the shaders that come with three.js examples and think that the answers are all in there somewhere - just have to find the equivalent to e.g. iChannel1 = a texture input etc.
I am not sure if you have answered your question, but it might be good for others to know the integration steps for shadertoys to THREEJS.
First, you need to know that shadertoys is a fragment shaders. That being said, you have to set a "general purpose" vertex shader that should work with all shadertoys (fragment shaders).
Step 1
Create a "general purpose" vertex shader
varying vec2 vUv;
void main()
{
vUv = uv;
vec4 mvPosition = modelViewMatrix * vec4(position, 1.0 );
gl_Position = projectionMatrix * mvPosition;
}
This vertex shader is pretty basic. Notice that we defined a varying variable vUv to tell the fragment shader where is the texture mapping. This is important because we are not going to use the screen resolution (iResolution) for our base rendering. We will use the texture coordinates instead. We have done that in order to integrate multiple shadertoys on different objects in the same THREEJS scene.
Step 2
Pick the shadertoys that we want and create the fragment shader. (I have chosen a simple toy that performs well: Simple tunnel 2D by niklashuss).
Here is the given code for this toy:
void main(void)
{
vec2 p = gl_FragCoord.xy / iResolution.xy;
vec2 q = p - vec2(0.5, 0.5);
q.x += sin(iGlobalTime* 0.6) * 0.2;
q.y += cos(iGlobalTime* 0.4) * 0.3;
float len = length(q);
float a = atan(q.y, q.x) + iGlobalTime * 0.3;
float b = atan(q.y, q.x) + iGlobalTime * 0.3;
float r1 = 0.3 / len + iGlobalTime * 0.5;
float r2 = 0.2 / len + iGlobalTime * 0.5;
float m = (1.0 + sin(iGlobalTime * 0.5)) / 2.0;
vec4 tex1 = texture2D(iChannel0, vec2(a + 0.1 / len, r1 ));
vec4 tex2 = texture2D(iChannel1, vec2(b + 0.1 / len, r2 ));
vec3 col = vec3(mix(tex1, tex2, m));
gl_FragColor = vec4(col * len * 1.5, 1.0);
}
Step 3
Customize the shadertoy raw code to have a complete GLSL fragment shader.
The first thing missing out the code are the uniforms and varyings declaration. Add them at the top of your frag shader file (just copy and paste the following):
uniform float iGlobalTime;
uniform sampler2D iChannel0;
uniform sampler2D iChannel1;
varying vec2 vUv;
Note, only the shadertoys variables used for that sample are declared, plus the varying vUv previously declared in our vertex shader.
The last thing we have to twick is the proper UV mapping, now that we have decided to not use the screen resolution. To do so, just replace the line that uses the IResolution uniforms i.e.:
vec2 p = gl_FragCoord.xy / iResolution.xy;
with:
vec2 p = -1.0 + 2.0 *vUv;
That's it, your shaders are now ready for usage in your THREEJS scenes.
Step 4
Your THREEJS code:
Set up uniform:
var tuniform = {
iGlobalTime: { type: 'f', value: 0.1 },
iChannel0: { type: 't', value: THREE.ImageUtils.loadTexture( 'textures/tex07.jpg') },
iChannel1: { type: 't', value: THREE.ImageUtils.loadTexture( 'textures/infi.jpg' ) },
};
Make sure the textures are wrapping:
tuniform.iChannel0.value.wrapS = tuniform.iChannel0.value.wrapT = THREE.RepeatWrapping;
tuniform.iChannel1.value.wrapS = tuniform.iChannel1.value.wrapT = THREE.RepeatWrapping;
Create the material with your shaders and add it to a planegeometry. The planegeometry() will simulate the shadertoys 700x394 screen resolution, in other words it will best transfer the work the artist intented to share.
var mat = new THREE.ShaderMaterial( {
uniforms: tuniform,
vertexShader: vshader,
fragmentShader: fshader,
side:THREE.DoubleSide
} );
var tobject = new THREE.Mesh( new THREE.PlaneGeometry(700, 394,1,1), mat);
Finally, add the delta of the THREE.Clock() to iGlobalTime value and not the total time in your update function.
tuniform.iGlobalTime.value += clock.getDelta();
That is it, you are now able to run most of the shadertoys with this setup...
2022 edit: The version of Shaderfrog described below is no longer being actively developed. There are bugs in the compiler used making it not able to parse all shaders correctly for import, and it doesn't support many of Shadertoy's features, like multiple image buffers. I'm working on a new tool if you want to follow along, otherwise you can try the following method, but it likely won't work most of the time.
Original answer follows:
This is an old thread, but there's now an automated way to do this. Simply go to http://shaderfrog.com/app/editor/new and on the top right click "Import > ShaderToy" and paste in the URL. If it's not public you can paste in the raw source code. Then you can save the shader (requires sign up, no email confirm), and click "Export > Three.js".
You might need to tweak the parameters a little after import, but I hope to have this improved over time. For example, ShaderFrog doesn't support audio nor video inputs yet, but you can preview them with images instead.
Proof of concept:
ShaderToy https://www.shadertoy.com/view/MslGWN
ShaderFrog http://shaderfrog.com/app/view/247
Full disclosure: I am the author of this tool which I launched last week. I think this is a useful feature.
This is based on various sources , including the answer of #INF1.
Basically you insert missing uniform variables from Shadertoy (iGlobalTime etc, see this list: https://www.shadertoy.com/howto) into the fragment shader, the you rename mainImage(out vec4 z, in vec2 w) to main(), and then you change z in the source code to 'gl_FragColor'. In most Shadertoys 'z' is 'fragColor'.
I did this for two cool shaders from this guy (https://www.shadertoy.com/user/guil) but unfortunately I didn't get the marble example to work (https://www.shadertoy.com/view/MtX3Ws).
A working jsFiddle is here: https://jsfiddle.net/dirkk0/zt9dhvqx/
Change the shader from frag1 to frag2 in line 56 to see both examples.
And don't 'Tidy' in jsFiddle - it breaks the shaders.
EDIT:
https://medium.com/#dirkk/converting-shaders-from-shadertoy-to-threejs-fe17480ed5c6

Volume ray casting doesn't work fine (Webgl + GLSL + Three.js)

I have tried to make better quality of my volume ray casting algorithm. I have set a smaller step of raycast (quality is better), but it causes problem. It is on pictures below (black areas where they shouldnt be).
I am using RGB cube to get direction of ray in volume.
I think, i have the same algorithm like there: volume rendering (using glsl) with ray casting algorithm
Have anybody some ideas, where could be a problem? I need to resolve this, because deadline of my diplom thesis is to close:( I realy don't know, why it doesnt work:(
EDIT:
I cant show there my all code (it could be problem, if i will supply it before hand it in school). But the key code to going throught the volume:
// All variables neede to rays
vec3 rayDirection = texture2D(backFaceCube, texCoo).xyz - varcolor.xyz;
float lenRay = length(rayDirection);
vec3 normDir = normalize(rayDirection);
float d = qualitySteps; //quality steps is size of steps defined by user -> example: 0.01, 0.001, 0.0001 etc.
vec3 step = normDir * d;
float lenStep = length(step);
float accumulatedLength = 0.0;
and then in cycle:
posInCube.xyz += step;
accumulatedLength += lenStep;
...
...
...
if(accumulatedLength >= lenRay || accumulatedColor.a > 1.0 ) {
break;
}
EDIT2:(sorry but like comment it was too long)
Yes, the texture is noisy...i have tried to delete the condition with alpha: if(accumulatedColor.a > 1.0), but the result is same.
I think that there is some direct correlation with length of ray and size of step. I tried many combination and i have found these things.
If step is big, i am able to go throught all volume, but if it is small, than i am realy not able to go throught volume (maybe). If step is extremely big, than i can see mirroved object (it can be caused by repeating texture if i go out of the texture on GPU). If step is too small, than i am able to mapped only small part of texture -> it seems, that ray is too short, but in reality he isnt. Questins are, why mapping of 3D coordinates to 2D texture is wrong and depend on size of step..
Can you please supply the code for your fragment shader?
Are you traversing the whole vector from front to end position? Here's an example shader (the code might contain some errors since I just wrote it from the top of my head. I unfortunately can't test the code on my computer at the moment):
in vec2 texCoord;
out vec4 outColor;
uniform float stepSize;
uniform int numSteps;
uniform sampler2d frontTexture;
uniform sampler2d backTexture;
uniform sampler3d volumeTexture;
uniform sampler1d transferTexture; // Density to RGB
void main()
{
vec4 color = vec4(0.0);
vec3 startPosition = texture(frontTexture, texCoord);
vec3 endPosition = texture(backTexture, texCoord);
vec3 delta = normalize(startPosition - endPosition) * stepSize;
vec3 position = startPosition;
for (int i = 0; i < numSteps; ++i)
{
float density = texture(volumeTexture, position).r;
vec3 voxelColor = texture(transferTexture, density);
// Sampling distance correction
color.a = 1.0 - pow((1.0 - color.a), stepSize * 500.0);
// Front to back blending (no shading done)
color.rgb = color.rgb + (1.0 - color.a) * voxelColor.a * voxelColor.rgb;
color.a = color.a + (1.0 - color.a) * voxelColor.a;
if (color.a >= 1.0)
{
break;
}
// Advance
position += direction;
if (position.x > 1.0 || position.y > 1.0 || position.z > 1.0)
{
break;
}
}
outColor = color;
}

How to emulate GL_DEPTH_CLAMP_NV?

I have a platform where this extension is not available ( non NVIDIA ).
How could I emulate this functionality ?
I need it to solve far plane clipping problem when rendering stencil shadow volumes with z-fail algorithm.
Since you say you're using OpenGL ES, but also mentioned trying to clamp gl_FragDepth, I'm assuming you're using OpenGL ES 2.0, so here's a shader trick:
You can emulate ARB_depth_clamp by using a separate varying for the z-component.
Vertex Shader:
varying float z;
void main()
{
gl_Position = ftransform();
// transform z to window coordinates
z = gl_Position.z / gl_Position.w;
z = (gl_DepthRange.diff * z + gl_DepthRange.near + gl_DepthRange.far) * 0.5;
// prevent z-clipping
gl_Position.z = 0.0;
}
Fragment shader:
varying float z;
void main()
{
gl_FragColor = vec4(vec3(z), 1.0);
gl_FragDepth = clamp(z, 0.0, 1.0);
}
"Fall back" to ARB_depth_clamp?
Check if NV_depth_clamp exists anyway? For example my ATI card supports five "NVidia-only" GL extensions.

Resources