Metal fragment shader bound resource acting erratically on macOS - macos

This problem has been stumping me, and I can't figure it out. I've got a few different shaders which use the same resource, a structure with a bunch of lighting values. The first shader to use it works fine -- the second one does not. Seems like it might be getting zeroes. The third shader to use it also works fine.
If I don't have any objects which use the first shader, then the second one works. And the third one does NOT work. The resource never changes, it's an MTLBuffer I set once. And, if I GPU Frame Capture, the values reported in the draw calls are all correct, all the time. Yet nothing shows up for the shaders that don't work.
The reason I think it is this one particular struct that is fluctuating is that if I hard-code the values into the shader instead of reading the struct values, then that also works. It's driving me crazy.
I am not sure what kind of code examples will serve here. Here is the structure and how I am using it. I'm not binding any other resources to this particular index, anywhere else in the program.
struct Light {
packed_float3 color; // 0 - 2
float ambientIntensity; // 3
packed_float3 direction; // 4 - 6
float diffuseIntensity; // 7
float shininess; // 8
float specularIntensity; // 9
float dummy1,dummy2;
/*
_______________________
|0 1 2 3|4 5 6 7|8 9 |
-----------------------
| | | |
| chunk0| chunk1| chunk2|
*/
}__attribute__ ((aligned (16)));
... shader code ....
float4 ambientColor = float4(1,1,1,1) * color * 0.25;
//Diffuse
float diffuseFactor = max(0.0,dot(interpolated.normal, float3(0.0,0.5,-1.0))); // 1
float4 diffuseColor = float4(light.color,1) * color * 0.5 * diffuseFactor ; // 2
//Specular
float3 eye = normalize(interpolated.fragmentPosition); //1
float3 reflection = reflect(float3(0.0,0.5,-1.0),interpolated.normal); // 2
float specularFactor = pow(max(0.0, dot(reflection, eye)), 10.0); //3
float4 specularColor = float4(float3(1,1,1)* 2 * specularFactor ,color.w);//4
color = (ambientColor + diffuseColor + specularColor);
color = clamp(color, 0.0, 1.0);
Anyone have any ideas about what could be the trouble? Also, this is only happening on the Mac. On iOS it works fine.
EDIT: It seems like when the shader is failing, it is simply unable to access the resource at all. For example, I tried just setting my fragment return color to equal the light color, with no modifications.
And it produces nothing -- not even black! If I replace that statement with one of these two, I get white and black:
color = float4(1,1,1, 1); // produces white
color = float4(0,0,0, 1); //produces black
color = float4(light.color.xyz, 1); // nothing at all
It is like the fragment shader is just aborting when it is trying to access the light structure -- otherwise I would get SOMETHING in my color, with a forced alpha value of 1. I don't get it.

Related

Three.js Get local position of vertex in shader, is that even what I need?

I am attempting to implement this technique of rendering grass into my three.js app.
http://davideprati.com/demo/grass/
On level terrain at y position 0, everything looks absolutely fantastic!
Problem is, my app (game) has the terrain modified by a heightmap so very few (if any) positions on that terrain are at y position 0.
It seems this vertex shader animation code assumes the grass object is sitting at y position 0 for the following vertex shader code to work as intended:
if (pos.y > 1.0) {
float noised = noise(pos.xy);
pos.y += sin(globalTime * magnitude * noised);
pos.z += sin(globalTime * magnitude * noised);
if (pos.y > 1.7){
pos.x += sin(globalTime * noised);
}
}
This condition works on the assumption that terrain is flat and at position 0, so that only vertices above the ground animate. Well.. umm.. since all vertices are above 1 with a heightmap (mostly), some strange effects occur, such as grass sliding all over the place lol.
Is there a way to do this where I can specify a y position threshold based more on the sprite than its world position? Or is there a better way all together to deal with this "slidy" problem?
I am an extreme noobie when it comes to shader code =]
Any help would be greatly appreciated.
I have no idea what I'm doing.
Edit* Ok, I think the issue is that I am altering the y position of each mesh merged into the main grass container geometry based on the y position of the terrain it sits on. I guess the shader is looking at the local position, but since the geometry itself vertically displaced, the shader doesn’t know how to compensate. Hmm…
Ok, I made a fiddle that demonstrates the issue:
https://jsfiddle.net/titansoftime/a3xr8yp7/
Change the value on line# 128 to a 1 instead of 2 and everything looks fine. Not sure how to go about fixing this.
Also, I have no idea why the colors are doing that, they look fine in my app.
If I understood the question correctly:
You are right in asking for "local" position. Lets say the single strand of grass is a narrow strip, with some height segments.
If you want this to be modular, easy to scale and such, this would most likely extend in some direction in the 0-1 range. Lets say it has four segments along that direction, which would yield vertices with with coordinates [0.0, 0.333, 0.666, 1.0]. It makes slightly more sense than an arbitrary range, because it's easy to reason that 0 is ground, 1 is the tip of the blade.
This is the "local" or model space. When you multiply this with the modelMatrix you transform it to world space (call it localToWorld).
In the shader it could look something like this
void main(){
vec4 localPosition = vec4( position, 1.);
vec4 worldPosition = modelMatrix * localPosition;
vec4 viewPosition = viewMatrix * worldPosition;
vec4 projectedPosition = projectionMatrix * viewPosition; //either orthographic or perspective
gl_Position = projectedPosition;
}
This is the classic "you have a scene graph node" which you transform. Depending on what you set for your mesh position, rotation and scale vec4 worldPosition will be different, but the local position is always the same. You can't tell from that value alone if something is the bottom or top, any value is viable since your terrain can be anything.
With this approach, you can write a shader and logic saying that if a vertex is at height of 0 (or less than some epsilon) don't animate.
So this brings us to some logic, that works in some assumed space (you have a rule for 1.0, and 1.7).
Because you are translating the geometries, and merging them, you no longer have this user friendly space that is the model space. Now these blades may very well skip local2world transformation (it may very well end up being just an identity matrix).
This messes up your logic for selecting the vertices obviously.
If you have to take the approach of distributing them as such, then you need another channel to carry the meaning of that local space, even if you only use it for that animation.
Two suitable channels already exist - UV, and vertex color. Uv's you can imagine as having another flat mesh, in another space, that maps to the mesh you are rendering. But in this particular case it seems like you can use a custom attribute aBladeHeight that can be a float for example.
void main(){
vec4 worldPosition = vec4(position, 1.); //you "burnt/baked" this transformation in, so no need to go from local to world in the shader
vec2 localPosition = uv; //grass in 2d, not transformed to your terrain
//this check knows whats on the bottom of the grass
//rather than whats on the ground (has no idea where the ground is)
if(localPosition.y){
//since local does not exist, the only space we work in is world
//we apply the transformation in that space, but the filter
//is the check above, in uv space, where we know whats the bottom, whats the top
worldPosition.xy += myLogic();
}
gl_Position = projectionMatrix * viewMatrix * worldPosition;
}
To mimic the "local space"
void main(){
vec4 localSpace = vec4(uv,0.,1.);
gl_Position = projectionMatrix * modelViewMatrix * localSpace;
}
And all the blades would render overlapping each other.
EDIT
With instancing the shader would look something like this:
attribute vec4 aInstanceMatrix0; //16 floats to encode a matrix4
attribute vec4 aInstanceMatrix1;
attribute vec4 aInstanceMatrix2;
//attribute vec4 aInstanceMatrix3; //but one you know will be 0,0,0,1 so you can pack in the first 3
void main(){
vec4 localPos = vec4(position, 1.); //the local position is intact, its the normalized 0-1 blade
//do your thing in local space
if(localPos.y > foo){
localPos.xz += myLogic();
}
//notice the difference, instead of using the modelMatrix, you use the instance attributes in it's place
mat4 localToWorld = mat4(
aInstanceMatrix0,
aInstanceMatrix1,
aInstanceMatrix2,
//aInstanceMatrix3
0. , 0. , 0. , 1. //this is actually wrong i think, it should be the last column not row, but for illustrative purposes,
);
//to pack it more effeciently the rows would look like this
// xyz w
// xyz w
// xyz w
// 000 1
// off the top of my head i dont know what the correct code is
mat4 foo = mat4(
aInstanceMatrix0.xyz, 0.,
aInstanceMatrix1.xyz, 0.,
aInstanceMatrix2.xyz, 0.,
aInstanceMatrix0.w, aInstanceMatrix1.w, aInstanceMatrix2.w, 1.
)
//you can still use the modelMatrix with this if you want to move the ENTIRE hill with all the grass with .position.set()
vec4 worldPos = localToWorld * localPos;
gl_Position = projectionMatrix * viewMatrix * worldPos;
}

Metal normal doesn't interpolate

I've been learning how Metal works using Swift and targeting macOS. Thing's have been going okay, but now, close to getting the stuff done, I've hit a problem that I cannot possibly understand ... I hope you guys will help me :)
I'm loading and displaying a OBJ teapot, which I'm lighting using ambiant+diffuse+specular light. Lighting in itself works well, but problem is : the normal vector is not interpolated when going to the fragment shader, which results in having flat lighting on supposedly curved surface ... Not good ...
I really don't understand why the normal is not interpolated while other values (position + eye) are ... Here is my shader and an image to show the result :
Thanks in advance :)
struct Vertex
{
float4 position;
float4 normal;
};
struct ProjectedVertex
{
float4 position [[position]];
float3 eye;
float3 normal;
};
vertex ProjectedVertex vertex_project(device Vertex *vertices [[buffer(0)]],
constant Uniforms &uniforms [[buffer(1)]],
uint vid [[vertex_id]])
{
ProjectedVertex outVert;
outVert.position = uniforms.modelViewProjectionMatrix * vertices[vid].position;
outVert.eye = -(uniforms.modelViewProjectionMatrix * vertices[vid].position).xyz;
outVert.normal = (uniforms.modelViewProjectionMatrix * float4(vertices[vid].normal)).xyz;
return outVert;
}
fragment float4 fragment_light(ProjectedVertex vert [[stage_in]],
constant Uniforms &uniforms [[buffer(0)]])
{
float3 ambientTerm = light.ambientColor * material.ambientColor;
float3 normal = normalize(vert.normal);
float diffuseIntensity = saturate(dot(normal, light.direction));
float3 diffuseTerm = light.diffuseColor * material.diffuseColor * diffuseIntensity;
float3 specularTerm(0);
if (diffuseIntensity > 0)
{
float3 eyeDirection = normalize(vert.eye);
float3 halfway = normalize(light.direction + eyeDirection);
float specularFactor = pow(saturate(dot(normal, halfway)), material.specularPower);
specularTerm = light.specularColor * material.specularColor * specularFactor;
}
return float4(ambientTerm + diffuseTerm + specularTerm, 1);
}
screenshot
So problem was that using OBJ-C, when I indexed the vertices from the OBJ file, I only generated 1 vertex for shared vertices between surfaces, so I kept only 1 normal.
When translating it to swift, the hash value I used to check if the vertex is at the same place than one I already have was wrong and couldn't detect shared vertices, which resulted in keeping all of the normals, so each surface is flat.
I don't know if I'm clear enough but that's what happened, for future reference, this question was about making a Swift version of "metalbyexample" book which is Obj-C only.

Transition shader

I have created a transition shader.
This is what is does:
On each update the color that should be alpha changes.
Then preform a check for each pixel.
If the color of the pixel is more that the 'alpha' value
Set this pixel to transparent.
Else If the color of the pixel is more that the 'alpha' value - 50
Set this pixel to partly transparent.
Else
Set the color to black.
EDIT (DELETED OLD PARTS):
I tried converting my GLSL into AGAL (using http://cmodule.org/glsl2agal):
Fragment shader:
const float alpha = 0.8;
varying vec2 TexCoord; //not used but required for converting
uniform sampler2D transition;//not used but required for converting
void main()
{
vec4 color = texture2D(transition, TexCoord.st);//not used but required for converting
color.a = float(color.r < alpha);
if(color.r >= (alpha - 0.1)){
color.a = 0.2 * (color.r - alpha - 0.1);
}
gl_FragColor = vec4(0, 0, 0, color.a);
}
And I've customized the output and added that to a (custom) Starling filter:
var fragmentShader:String =
"tex ft0, v0, fs0 <2d, clamp, linear, mipnone> \n" + // copy color to ft0
"slt ft0.w, ft0.x, fc0.x \n" + // alpha = red < inputAlpha
"mov ft0.xyz, fc1.xyzz \n" + // set color to black
"mov oc, ft0";
mShaderProgram = target.registerProgramFromSource(PROGRAM_NAME, vertexShader, fragmentShader);
It works and when I set the filters alpha, it will update the stuff. The only thing left is the partly transparent thing, but I have no idea how I can do that.
Swap the cycle on the Y and X coordinates. By using the X in the inner loop you optimize the L1 cache and the prefetcher of the CPU.
Some minor hints:
Remove the zeros for a cleaner code:
const c:uint = a << 24
Verify that 255/50 is collapsed into a single constant by the compiler.
Don't be crazy by doing it with BitmapData once you're using Starling.
I didn't get if you're grayscaling it by yourself or not. In not, just create a Starling filter for grayscale (pixel shader below will do the trick)
tex ft0, v0, fs0 <2d,linear,clamp>
add ft1.x, ft0.x, ft0.y
add ft1.x, ft1.x, ft0.z
div ft1.x, ft1.x, fc0.x
mov ft0.xyz, ft1.xxx
mov oc ft0
And for the alpha transition just extend the Image Class, implement IAnimatable add it to the Juggler. in the advanceTime just do a this.alpha -= VALUE;
Simple like that :)
Just going to elaborate a bit on #Paxel's answer. I discussed with another developer Jackson Dunstan about the L1 caching, where the speed improvement comes from, and what other improvements can be made to code like this to see performance gain.
After which Jackson posted a blog entry which can be read at here: Take Advantage of CPU caching
I'll post some the relative items. First the bitmap data is stored in memory by rows. The rows memory addresses might look something like this:
row 1: 0 1 2 3 4 5
row 2: 6 7 8 9 10 11
row 3: 12 13 14 15 16 17
Now running your inner loop through the rows will allow you leverage the L1 cache advantage since you can read the memory in order. So inner looping X first you'll read the first row as:
0 1 2 3 4 5
But if you were to do it Y first you'd read it as:
0 6 12 1 7 13
As you can see you are bouncing around memory addresses making it a slower process.
As for optimizations that could be made, the suggestion is to cache your width and height getters, storing the properties into local variables. Also using the Math.round() is pretty slow, replacing that would see a speed increase.

How to blur the outcome of a fragment shader?

I'm working on a shader that generates little clouds based on some mask images. Right now it works well, but i feel the result is missing something, and i thought a blur would be nice. I remember a basic blur algorithm where you have to apply a convolution with a matrix of norm 1 (the bigger the matrix the greater the result) and an image. The thing is, I don't know how to treat the current outcome of the shader as an image. So basically I want to keep the shader as is, but getting it blurry. Any ideas?, how can I integrate the convolution algorithm to the shader? Or does anyone know of other algorithm?
Cg code:
float Luminance( float4 Color ){
return 0.6 * Color.r + 0.3 * Color.g + 0.1 * Color.b;
}
struct v2f {
float4 pos : SV_POSITION;
float2 uv_MainTex : TEXCOORD0;
};
float4 _MainTex_ST;
v2f vert(appdata_base v) {
v2f o;
o.pos = mul(UNITY_MATRIX_MVP, v.vertex);
o.uv_MainTex = TRANSFORM_TEX(v.texcoord, _MainTex);
return o;
}
sampler2D _MainTex;
sampler2D _Gradient;
sampler2D _NoiseO;
sampler2D _NoiseT;
float4 frag(v2f IN) : COLOR {
half4 nO = tex2D (_NoiseO, IN.uv_MainTex);
half4 nT = tex2D (_NoiseT, IN.uv_MainTex);
float4 turbulence = nO + nT;
float lum = Luminance(turbulence);
half4 c = tex2D (_MainTex, IN.uv_MainTex);
if (lum >= 1.0f){
float pos = lum - 1.0f;
if( pos > 0.98f ) pos = 0.98f;
if( pos < 0.02f ) pos = 0.02f;
float2 texCord = (pos, pos);
half4 turb = tex2D (_Gradient, texCord);
//turb.a = 0.0f;
return turb;
}
else return c;
}
It appears to me that this shader is emulating alpha testing between a backbuffer-like texture (passed via the sampler2D _MainTex) and a generated cloud luminance (represented by float lum) mapped onto a gradient. This makes things trickier because you can't just fake a blur and let alpha blending take care of the rest. You'll also need to change your alpha testing routine to emulate an alpha blend instead or restructure your rendering pipeline accordingly. We'll deal with blurring the clouds first.
The first question you need to ask yourself is if you need a screen-space blur. Seeing the mechanics of this fragment shader, I would think not -- you want to blur the clouds on the actual model. Given this, it should be sufficient to blur the underlying textures and result in a blurred result -- except you're emulating alpha clipping, so you'll get rough edges. The question is what to do about those rough edges. That's where alpha blending comes in.
You can emulate alpha blending by using a lerp (linear interpolation) between the turb color and c color with lerp() function (depending on which shader language you're using). You'll probably want something that looks like return lerp(c, turb, 1 - pos); instead of return turb; ... I'd expect you'll want to tweak this continually until you understand and start getting the results you want. (For example, you may prefer lerp(c, turb, 1 - pow(pos,4)))
In fact, you can try this last step (just adding the lerp) before modifying your textures to get an idea of what the alpha blending will do for you.
Edit: I hadn't considered the case where the _NoiseO and _NoiseT samplers were changing continually, so simply telling you to blur them was minimally useful advice. You can emulate blurring by using a multi-tap filter. The most simple way is to take uniformly spaced samples, weight them, and sum them together resulting in your final color. (Typically you'll want the weights themselves to sum to 1.)
This being said, you may or may not way to do this on the _NoiseO and _NoiseT textures themselves -- you may want to create a screen-space blur instead which may look more interesting to a viewer. In this case, the same concept applies, but you need to do the calculations for the offset coordinates for each tap and then perform a weighted summation.
For example if we were going with the first case and we wanted to sample from the _Noise0 sampler and blur it slightly, we could use this box filter (where all the weights are the same and sum to 1, thus performing an average):
// Untested code.
half4 nO = 0.25 * tex2D(_Noise0, IN.uv_MainTex + float2( 0, 0))
+ 0.25 * tex2D(_Noise0, IN.uv_MainTex + float2( 0, g_offset.y))
+ 0.25 * tex2D(_Noise0, IN.uv_MainTex + float2(g_offset.x, 0))
+ 0.25 * tex2D(_Noise0, IN.uv_MainTex + float2(g_offset.x, g_offset.y))
Alternatively, if we wanted the entire cloud output to appear blurry we'd wrap the cloud generation portion in a function and call it instead of tex2D() for the taps.
// More untested code.
half4 genCloud(float2 tc) {
half4 nO = tex2D (_NoiseO, IN.uv_MainTex);
half4 nT = tex2D (_NoiseT, IN.uv_MainTex);
float4 turbulence = nO + nT;
float lum = Luminance(turbulence);
float pos = lum - 1.0;
if( pos > 0.98f ) pos = 0.98f;
if( pos < 0.02f ) pos = 0.02f;
float2 texCord = (pos, pos);
half4 turb = tex2D (_Gradient, texCord);
// Figure out how you'd generate your alpha blending constant here for your lerp
turb.a = ACTUAL_ALPHA;
return turb;
}
And the multi-tap filtering would look like:
// And even more untested code.
half4 cloudcolor = 0.25 * genCloud(IN.uv_MainTex + float2( 0, 0))
+ 0.25 * genCloud(IN.uv_MainTex + float2( 0, g_offset.y))
+ 0.25 * genCloud(IN.uv_MainTex + float2(g_offset.x, 0))
+ 0.25 * genCloud(IN.uv_MainTex + float2(g_offset.x, g_offset.y))
return lerp(c, cloudcolor, cloudcolor.a);
However doing this is going to be relatively slow for calculations if you make the cloud function too complex. If you're bound by raster operations and texture reads (transferring texture/buffer data to and from memory) chances are this won't matter much unless you use a much more advanced blurring technique (such successful downsampling through ping-ponged buffers, useful for blurs/filters that are expensive because they have lots of taps). But performance is another entire consideration from just getting the look you want.

Numeric Stability with Summed Area Tables in Shadow Mapping

Im having issue with loss of precision in my SAVSM setup.
when you see the light moving around the effect is very striking; there is a lot of noise with fragments going black and white all the time. This can be somewhat lessened by using the minvariance (thus ignoring anything below a certain threshold) but then we get even worse effects with incorrect falloff (see my other post).
Im using GLSL 1.2 because I'm on a mac so I dont have access to the modf function in order to split the precision across two channels as described in GPU Gems 3 Chapter 8.
Im using GL_RGBA32F_ARB textures with a Framebuffer object and ping ponging two textures to generate a summed area table which i use with the VSM algorithm.
Moments / Depth Shader to create the basis for the tables
varying vec4 v_position;
varying float tDepth;
float g_DistributeFactor = 1024.0;
void main()
{
// Is this linear depth? I would say yes but one can't be utterly sure.
// Could try a divide by the far plane?
float depth = v_position.z / v_position.w ;
depth = depth * 0.5 + 0.5; //Don't forget to move away from unit cube ([-1,1]) to [0,1] coordinate system
vec2 moments = vec2(depth, depth * depth);
// Adjusting moments (this is sort of bias per pixel) using derivative
float dx = dFdx(depth);
float dy = dFdy(depth);
moments.y += 0.25 * (dx*dx+dy*dy);
// Subtract 0.5 off now so we can get this into our summed area table calc
//moments -= 0.5;
// Split the moments into rg and ba for EVEN MORE PRECISION
// float FactorInv = 1.0 / g_DistributeFactor;
// gl_FragColor = vec4(floor(moments.x) * FactorInv, fract(moments.x ) * g_DistributeFactor,
// floor(moments.y) * FactorInv, fract(moments.y) * g_DistributeFactor);
gl_FragColor = vec4(moments,0.0,0.0);
}
The shadowmap shader
varying vec4 v_position;
varying float tDepth;
float g_DistributeFactor = 1024.0;
void main()
{
// Is this linear depth? I would say yes but one can't be utterly sure.
// Could try a divide by the far plane?
float depth = v_position.z / v_position.w ;
depth = depth * 0.5 + 0.5; //Don't forget to move away from unit cube ([-1,1]) to [0,1] coordinate system
vec2 moments = vec2(depth, depth * depth);
// Adjusting moments (this is sort of bias per pixel) using derivative
float dx = dFdx(depth);
float dy = dFdy(depth);
moments.y += 0.25 * (dx*dx+dy*dy);
// Subtract 0.5 off now so we can get this into our summed area table calc
//moments -= 0.5;
// Split the moments into rg and ba for EVEN MORE PRECISION
// float FactorInv = 1.0 / g_DistributeFactor;
// gl_FragColor = vec4(floor(moments.x) * FactorInv, fract(moments.x ) * g_DistributeFactor,
// floor(moments.y) * FactorInv, fract(moments.y) * g_DistributeFactor);
gl_FragColor = vec4(moments,0.0,0.0);
}
The Summed tables do seem to be working. I know this because I have a function that converts back from the summed table to the original depth map and the two images do look pretty much the same. Im also using the -0.5 + 0.5 trick in order to get some more precision but it doesnt seem to be helping
My question is this, given that im on a mac which has GLSL 1.2 only, how can I split the precision over two channels? If I could use these extra channels for space in the summed table then maybe that would work? Ive seen some stuff that uses modf but that isnt available to me.
Also, people have suggested 32 bit integer buffers but I dont think I have support for these on my macbook pro.

Resources