Shader - Unexpected behaviour when dividing with a high value - opengl-es

I have this line:
gl_FragColor = vec4(worldPos.x / maxX, worldPos.z / maxZ, 1.0, 1.0);
Where worldPos.x and worldPos.y goes from 0 to 19900. maxX and maxZ are float uniforms. It works as expected when maxX and maxZ are set to 5000.0 (a gradient to white and above 5000 it's all white), but when maxX and maxZ are set to 19900.0 it all turns blue. Why is that and how to get around it? Hardcoding the values doesn't make a difference, i.e:
gl_FragColor = vec4(worldPos.x / 5000.0, worldPos.z / 5000.0, 1.0, 1.0);
works as expected while:
gl_FragColor = vec4(worldPos.x / 19900.0, worldPos.z / 19900.0, 1.0, 1.0);
makes it all blue. This only happens on some devices and not on others.
Update:
Adding highp modifier (as suggested by Michael below) solved it for one device, but when testing on another it didn't make any difference. Then I tried to do the division on the CPU (also suggested by Michael) like this:
in java, before passing it as uniform:
float maxX = 1.0f / 19900.0f;
float maxZ = 1.0f / 19900.0f;
program.setUniformf(maxXUniform, maxX);
program.setUniformf(maxZUniform, maxZ);
in shader:
uniform float maxX;
uniform float maxZ;
...
gl_FragColor = vec4(worldPos.x * maxX, worldPos.z * maxZ, 1.0, 1.0);
...
Final sulotion:
This still didn't cut it. Now the values are too small so when passed in to the shader they turn 0 due to too low float precision. Then I tried to multiply it by 100 before passing it in, and then multiplying it by 0.01 inside the shader.
in java:
float maxX = 100.0f / 19900.0f;
float maxZ = 100.0f / 19900.0f;
program.setUniformf(maxXUniform, maxX);
program.setUniformf(maxZUniform, maxZ);
in shader:
uniform float maxX;
uniform float maxZ;
...
gl_FragColor = vec4(worldPos.x * 0.01 * maxX, worldPos.z * 0.01 * maxZ, 1.0, 1.0);
...
And that solved the problem. Now the highp modifier isn't needed. Maybe it isn't the prettiest sulotion but it's efficient and robust.

I guess you're running OpenGL ES? Well,the floating precision sucks on many,usually quite old, devices.I had similar issues on several occasions when implementing cascaded shadows mapping in shaders for mobile hardware.
Make sure you use highp qualifier for those variables. (note - that might not solve the issue, but is worth to try)
Another possible solution: don't perform the division in the shader. That's a quite heavy operation for many old and weak implementations anyway. Try to avoid division, sqrt(),pow().Run shader profiler and you will be surprised to find out how much those ops are HEAVY! (iOS emulator on Mac has a nice shader profiler) Try to pass the results directly as uniforms.I am not sure that would be a problem in your case,as I can't see any of these variables bound to per-fragment execution.
And if it still doesn't help, then usually there is nothing you can do about that. That's the old hardware/GLSL implementation issue. But I am sure,if you calculate that on CPU and upload the results as uniforms, that should solve the issue.

Related

Showing Point Cloud Structure using Lighting in Three.js

I am generating a point cloud representing a rock using Three.js, but am facing a problem with visualizing its structure clearly. In the second screenshot below I would like to be able to denote the topography of the rock, like the corner (shown better in the third screenshot) of the structure, in a more explicit way, as I want to be able to maneuver around the rock and select different points. I have rocks that are more sparse (harder to see structure as points very far away) and more dense (harder to see structure from afar because points all mashed together, like first screenshot but even when closer to the rock), and finding a generalized way to approach this problem has been difficult.
I posted about this problem before here, thinking that representing the ‘depth’ of the rock into the screen would suffice, but after attempting the proposed solution I still could not find a nice way to represent the topography better. Is there a way to add a source of light that my shaders can pick up on? I want to see whether I can represent the colors differently based on their orientation to the source. Using a different software, a friend was able to produce the below image - is there a way to simulate this in Three.js?
For context, I am using Points with a BufferGeometry and ShaderMaterial. Below is the shader code I currently have:
Vertex:
precision mediump float;
varying vec3 vColor;
attribute float alpha;
varying float vAlpha;
uniform float scale;
void main() {
vAlpha = alpha;
vColor = color;
vec4 mvPosition = modelViewMatrix * vec4( position, 1.0 );
#ifdef USE_SIZEATTENUATION
//bool isPerspective = ( projectionMatrix[ 2 ][ 3 ] == - 1.0 );
//if ( isPerspective ) gl_PointSize *= ( scale / -mvPosition.z );
#endif
gl_PointSize = 2.0;
gl_Position = projectionMatrix * mvPosition;
}
and
Fragment:
#ifdef GL_OES_standard_derivatives
#extension GL_OES_standard_derivatives : enable
#endif
precision mediump float;
varying vec3 vColor;
varying float vAlpha;
uniform vec2 u_depthRange;
float LinearizeDepth(float depth, float near, float far)
{
float z = depth * 2.0 - 1.0; // Back to NDC
return (2.0 * near * far / (far + near - z * (far - near)) - near) / (far-near);
}
void main() {
float r = 0.0, delta = 0.0, alpha = 1.0;
vec2 cxy = 2.0 * gl_PointCoord.xy - 1.0;
r = dot(cxy, cxy);
float lineardepth = LinearizeDepth(gl_FragCoord.z, u_depthRange[0], u_depthRange[1]);
if (r > 1.0) {
discard;
}
// Reseted back to 1.0 instead of using lineardepth method above
gl_FragColor = vec4(vColor, 1.0);
}
Thank you so much for your help!

vec2 division acting weirdly (automatic aspect-ratio correction?)

Running into some issues with vec2 divison with OpenGL ES with WebGL --- specifically that it seems to automatically deal with aspect ratios. My understanding is that:
someVec2 / anotherVec2 = vec2(
someVec2.x / anotherVec2.x,
someVec2.y / anotherVec2.y)
i.e., it is component-wise.
However, this code (where uResolution is an ivec2 passed from the code, of the current resolution):
vec2 uv = gl_FragCoord.xy / float(uResolution);
gl_FragColor = vec4(uv.x, uv.y, 0.0, 1.0);
produces:
whereas
vec2 fragCoordUv = vec2(
gl_FragCoord.x / float(uResolution.x),
gl_FragCoord.y / float(uResolution.y)
);
gl_FragColor = vec4(uv.x, uv.y, 0.0, 1.0);
produces:
Specifically, the Y value doesn't seem to scale up all the way. The issue becomes more obvious if you use a texture. i.e. straight division:
vs. manual component division:
It looks like it's automatically performing aspect ratio correction. Is this a feature? I can't seem to find any information on it anywhere. Everything states that normal binary operators (+, -, /, *, etc) just work component-based. Please help!
Try with
vec2 uv = gl_FragCoord.xy / vec2(uResolution);

GLSL shader algorithm optimization

Is there a anyway to optimize the next algorithm to be any faster, even if is just a small speed increase?
const mat3 factor = mat3(1.0, 1.0, 1.0, 2.112, 1.4, 0.0, 0.0, 2.18, -2.21);
vec3 calculate(in vec2 coord)
{
vec3 sample = texture2D(texture_a, coord).rgb;
return (factor / sample) * 2.15;
}
The only significant optimization I can think of is to pack texture_a and texture_b into a single three-channel texture, if you can. That saves you one of the two texture lookups, which are most likely to be the bottleneck here.
#Thomas answer is the most helpfull, since texture lookups are most expensive, if his solution is possible in your application. If you already use those textures somewhere else better pass the values as parameters to avoid duplicate lookups.
Else I don't know if it can be optimized that much but some straight forward things that come to my mind.
Compiler optimizations:
Assign const keyword to coord parameter, if possible to sample too.
Assign f literal in each float element.
Maybe manually assign mat
I don't know if its faster because I don't know how the matrix multiplication is implemented but since the constant factor matrix contains many ones and zeros it maybe can be manually assigned.
vec3 calculate(const in vec2 coord)
{
//not 100% sure if that init is possible
const vec3 sample = vec3(texture2D(texture_a, coord).r
texture2D(texture_b, coord).ra - 0.5f);
vec3 result = vec3(sample.y);
result.x += sample.x + sample.z;
result.y += 2.112f * sample.x;
result.z *= 2.18f;
result.z -= 2.21f * sample.z;
return result;
}

Low shader performance on iPad 1st gen

I have my painting application which is written using OpenGL ES 1.0 and some Quartz.
I'm trying to rewrite it using OpenGL ES 2.0 for better performance and new features.
I have written 2 shaders: one renders user's input to texture and second mixes this texture with some other textures according to some rules.
Suddenly I realized that second shader works too long on iPad 1st generation - I have 10-15 fps only. iPad 2 works perfectly with 60+ fps. I was slightly shocked because original app (OpenGL ES 1.0) works fine on both devices. It renders only two polygons (but almost fullscreen).
I've tried some optimizations like changing precision, commented some math operations, hardcoded some textures calls - It helped a little, but I'm still far away from 60 fps. Only when I fully comment call of this shader I've got 60 fps.
Am I missing something? I haven't much experience in OpenGL but i do believe this shader must work fine on both generations of devices, just like original application works. My vertex and fragment shaders are:
===============Vertex Shader===================
uniform mat4 modelViewProjectionMatrix;
attribute vec3 position;
attribute vec2 texCoords;
varying vec2 fTexCoords;
void main()
{
fTexCoords = texCoords;
vec4 postmp = vec4(position.xyz, 1.0);
gl_Position = modelViewProjectionMatrix * postmp;
}
===============Fragment Shader===================
precision highp float;
varying lowp vec4 colorVarying;
varying highp vec2 fTexCoords;
uniform sampler2D texture; // black & white user should paint
uniform sampler2D drawingTexture; // texture with user drawings I rendered earlier
uniform sampler2D paperTexture; // texture of sheet of paper
uniform float currentArea; // which area we should not shadow
uniform float isShadowingOn; // bool - should we shadow some areas of picture
void main()
{
// I pass 1024*1024 texture here but I only need 560*800 so I do some calculations to find real texture coordinates
vec2 convertedTexCoords = vec2(fTexCoords.x * 560.0/1024.0, fTexCoords.y * 800.0/1024.0);
vec4 bgImageColor = texture2D(texture, convertedTexCoords);
float area = bgImageColor.a;
bgImageColor.a = 1.0;
vec4 paperColor = texture2D(paperTexture, convertedTexCoords);
vec4 drawingColor = texture2D(drawingTexture, convertedTexCoords);
// if special area
if ( abs(area - 1.0) < 0.0001) {
// if shadowing ON
if (isShadowingOn == 1.0) {
// if color of original image is black
if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
}
// if color of original image is grey
else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15 && bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){ gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);}
else
{
gl_FragColor = vec4(bgImageColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
}
}
// if shadowing is OFF
else {
// if color of original image is black
if ( (bgImageColor.r < 0.1) && (bgImageColor.g < 0.1) && (bgImageColor.b < 0.1) ) {
gl_FragColor = vec4(bgImageColor.rgb, 1.0);
}
// if color of original image is gray
else if ( abs(bgImageColor.r - bgImageColor.g) < 0.15 && abs(bgImageColor.r - bgImageColor.b) < 0.15 && abs(bgImageColor.g - bgImageColor.b) < 0.15
&& bgImageColor.r < 0.8 && bgImageColor.g < 0.8 && bgImageColor.b < 0.8){
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);
}
// rest
else {
gl_FragColor = vec4(bgImageColor.rgb, 1.0);
}
}
}
// if area of fragment is equal to current area
else if ( abs(area-currentArea/255.0) < 0.0001 ) {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
}
// if area of fragment is NOT equal to current area
else {
if (isShadowingOn == 1.0) {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0) * vec4(0.5, 0.5, 0.5, 1.0);
} else {
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb - drawingColor.rgb, 1.0);
}
}
}
Branching is really expensive to do in a shader, as it removes possibilities for the GPU to run the shader in parallel, and you are having a lot of branches in your fragment shader (the one shader that should be as fast as possible anyway). Even worse than that, you are branching based on values computed on the GPU itself which also drastically drains your performance.
You really should try to remove as many branches as possible, rather let the GPU do some "extra work" by eg. not trying to optimize the texture atlas and render everything (if this is possible), this will still be faster than your current version. If this doesn't work, try to split up your shader in multiple smaller shaders which each only does a specific part of your larger shader and branch on the CPU rather than on the GPU (you only need to do this once per draw call and not for every "pixel").
Beyond JustSid's valid point about branching in the shader, I see a few other things wrong here. First, if I just run this fragment shader through Imagination Texhnologies' PVRUniSco Editor (which you really should get, and is part of their free SDK), I see this:
which shows a best-case performance of 42 cycles, worst of 52 for this shader. From a similar case of fragment shader tuning I asked about, I found that an 11-16 cycle fragment shader took 35-68 ms to render on an iPad 1 (15 - 29 FPS). You're going to need to make this a lot tighter to get reasonable render times for it.
To eliminate some of the branches, you might be able to use a step function or play tricks with your alpha channel. I've done this and seen a massive reduction in shader rendering times. I would not pass in the isShadowingOn uniform, but I would split this into two shaders to use in the different cases of this being on and off.
Beyond branching, I can see that you're performing a dependent texture read for bgImageColor, paperColor, and drawingColor as a result of calculating the texture coordinates to fetch within your fragment shader. This is horribly expensive on the tile-based deferred renderer within iOS devices, because it prevents certain optimizations for texture fetching from being used. Instead of calculating this per-fragment, I recommend moving this calculation to the vertex shader and passing in the result as a varying to your fragment shader. Use that varying as the coordinate to fetch your textures and you'll see a massive boost in performance.
There are also smaller things you can do to tweak this. For example,
gl_FragColor = vec4((paperColor.rgb * bgImageColor.rgb - drawingColor.rgb) * 0.4, 1.0);
should be slightly faster than
gl_FragColor = vec4(paperColor.rgb * bgImageColor.rgb * 0.4 - drawingColor.rgb * 0.4, 1.0);
The editor will live-compile your shader, so you can try out these manipulations in code and see the results in terms of estimated GPU cycles.

How to emulate GL_DEPTH_CLAMP_NV?

I have a platform where this extension is not available ( non NVIDIA ).
How could I emulate this functionality ?
I need it to solve far plane clipping problem when rendering stencil shadow volumes with z-fail algorithm.
Since you say you're using OpenGL ES, but also mentioned trying to clamp gl_FragDepth, I'm assuming you're using OpenGL ES 2.0, so here's a shader trick:
You can emulate ARB_depth_clamp by using a separate varying for the z-component.
Vertex Shader:
varying float z;
void main()
{
gl_Position = ftransform();
// transform z to window coordinates
z = gl_Position.z / gl_Position.w;
z = (gl_DepthRange.diff * z + gl_DepthRange.near + gl_DepthRange.far) * 0.5;
// prevent z-clipping
gl_Position.z = 0.0;
}
Fragment shader:
varying float z;
void main()
{
gl_FragColor = vec4(vec3(z), 1.0);
gl_FragDepth = clamp(z, 0.0, 1.0);
}
"Fall back" to ARB_depth_clamp?
Check if NV_depth_clamp exists anyway? For example my ATI card supports five "NVidia-only" GL extensions.

Resources