Metal emulate geometry shaders using compute shaders - macos

I'm trying to implement voxel cone tracing in Metal. One of the steps in the algorithm is to voxelize the geometry using a geometry shader. Metal does not have geometry shaders so I was looking into emulating them using a compute shader. I pass in my vertex buffer into the compute shader, do what a geometry shader would normally do, and write the result to an output buffer. I also add a draw command to an indirect buffer. I use the output buffer as the vertex buffer for my vertex shader. This works fine, but I need twice as much memory for my vertices, one for the vertex buffer and one for the output buffer. Is there any way to directly pass the output of the compute shader to the vertex shader without storing it in an intermediate buffer? I don't need to save the contents of the output buffer of the compute shader. I just need to give the results to the vertex shader.
Is this possible? Thanks
EDIT
Essentially, I'm trying to emulate the following shader from glsl:
#version 450
layout(triangles) in;
layout(triangle_strip, max_vertices = 3) out;
layout(location = 0) in vec3 in_position[];
layout(location = 1) in vec3 in_normal[];
layout(location = 2) in vec2 in_uv[];
layout(location = 0) out vec3 out_position;
layout(location = 1) out vec3 out_normal;
layout(location = 2) out vec2 out_uv;
void main()
{
vec3 p = abs(cross(in_position[1] - in_position[0], in_position[2] - in_position[0]));
for (uint i = 0; i < 3; ++i)
{
out_position = in_position[i];
out_normal = in_normal[i];
out_uv = in_uv[i];
if (p.z > p.x && p.z > p.y)
{
gl_Position = vec4(out_position.x, out_position.y, 0, 1);
}
else if (p.x > p.y && p.x > p.z)
{
gl_Position = vec4(out_position.y, out_position.z, 0, 1);
}
else
{
gl_Position = vec4(out_position.x, out_position.z, 0, 1);
}
EmitVertex();
}
EndPrimitive();
}
For each triangle, I need to output a triangle with vertices at these new positions instead. The triangle vertices come from a vertex buffer and is drawn using an index buffer. I also plan on adding code that will do conservative rasterization (just increase the size of the triangle by a little bit) but it's not shown here. Currently what I'm doing in the Metal compute shader is using the index buffer to get the vertex, do the same code in the geometry shader above, and outputting the new vertex in another buffer which I then use to draw.

Here's a very speculative possibility depending on exactly what your geometry shader needs to do.
I'm thinking you can do it sort of "backwards" with just a vertex shader and no separate compute shader, at the cost of redundant work on the GPU. You would do a draw as if you had a buffer of all of the output vertices of the output primitives of the geometry shader. You would not actually have that on hand, though. You would construct a vertex shader that would calculate them in flight.
So, in the app code, calculate the number of output primitives and therefore the number of output vertices that would be produced for a given count of input primitives. Do a draw of the output primitive type with that many vertices.
You would not provide a buffer with the output vertex data as input to this draw.
You would provide the original index buffer and original vertex buffer as inputs to the vertex shader for that draw. The shader would calculate from the vertex ID which output primitive it's for, and which vertex of that primitive (e.g. for a triangle, vid / 3 and vid % 3, respectively). From the output primitive ID, it would calculate which input primitive would have generated it in the original geometry shader.
The shader would look up the indices for that input primitive from the index buffer and then the vertex data from the vertex buffer. (This would be sensitive to the distinction between a triangle list vs. triangle strip, for example.) It would apply any pre-geometry-shader vertex shading to that data. Then it would do the part of the geometry computation that contributes to the identified vertex of the identified output primitive. Once it has calculated the output vertex data, you can apply any post-geometry-shader vertex shading(?) that you want. The result is what it would return.
If the geometry shader can produce a variable number of output primitives per input primitive, well, at least you have a maximum number. So, you can draw the maximum potential count of vertices for the maximum potential count of output primitives. The vertex shader can do the computations necessary to figure out if the geometry shader would have, in fact, produced that primitive. If not, the vertex shader can arrange for the whole primitive to be clipped away, either by positioning it outside of the frustum or using a [[clip_distance]] property of the output vertex data.
This avoids ever storing the generated primitives in a buffer. However, it causes the GPU to do some of the pre-geometry-shader vertex shader and geometry shader calculations repeatedly. It will be parallelized, of course, but may still be slower than what you're doing now. Also, it may defeat some optimizations around fetching indices and vertex data that may be possible with more normal vertex shaders.
Here's an example conversion of your geometry shader:
#include <metal_stdlib>
using namespace metal;
struct VertexIn {
// maybe need packed types here depending on your vertex buffer layout
// can't use [[attribute(n)]] for these because Metal isn't doing the vertex lookup for us
float3 position;
float3 normal;
float2 uv;
};
struct VertexOut {
float3 position;
float3 normal;
float2 uv;
float4 new_position [[position]];
};
vertex VertexOut foo(uint vid [[vertex_id]],
device const uint *indexes [[buffer(0)]],
device const VertexIn *vertexes [[buffer(1)]])
{
VertexOut out;
const uint triangle_id = vid / 3;
const uint vertex_of_triangle = vid % 3;
// indexes is for a triangle strip even though this shader is invoked for a triangle list.
const uint index[3] = { indexes[triangle_id], index[triangle_id + 1], index[triangle_id + 2] };
const VertexIn v[3] = { vertexes[index[0]], vertexes[index[1]], vertexes[index[2]] };
float3 p = abs(cross(v[1].position - v[0].position, v[2].position - v[0].position));
out.position = v[vertex_of_triangle].position;
out.normal = v[vertex_of_triangle].normal;
out.uv = v[vertex_of_triangle].uv;
if (p.z > p.x && p.z > p.y)
{
out.new_position = float4(out.position.x, out.position.y, 0, 1);
}
else if (p.x > p.y && p.x > p.z)
{
out.new_position = float4(out.position.y, out.position.z, 0, 1);
}
else
{
out.new_position = float4(out.position.x, out.position.z, 0, 1);
}
return out;
}

Unfortunately there is no way to do this (and other things) in Metal, without going into unneeded complications.
The API lacks critical features that are common in Vulkan, OpenGL and DirectX...

Related

Calculating a transformation matrix to place an object on a sphere in glsl

I'm trying generate some matrices to place trees on a planet on the GPU. The position of each tree is predetermined - based on a biome map and various heightmap data - but this data is GPU resident so I can't do this on the CPU. At the moment I'm instancing using the geometry shader - this will change to traditional instancing if performance is bad, and I'd then compute the model matrices for each tree on a compute shader.
I've got as far as trying to use a modified version of lookAt() but I can't get it working and even if I did, the trees would be perpendicular to the planet instead of standing up. I know I can define a using 3 axis, so the normal of the sphere, a tangent and a bitangent, but given I don't care what direction these tangents and bitangents are in at the moment, what would be a quick way to calculate this matrix in GLSL? Thanks!
void drawInstance(vec3 offset)
{
//Grab the model's position from the model matrix
vec3 modelPos = vec3(modelMatrix[3][0],modelMatrix[3][1],modelMatrix[3][2]);
//Add the offset
modelPos +=offset;
//Eye = where the new pos is, look in x direction for now, planet is at origin so up is just the modelPos normalized
mat4 m = lookAt(modelPos, modelPos + vec3(1,0,0), normalize(modelPos));
//Lookat is intended as a camera matrix, fix this
m = inverse(m);
vec3 pos = gl_in[0].gl_Position.xyz;
gl_Position = vp * m *vec4(pos, 1.0);
EmitVertex();
pos = gl_in[1].gl_Position.xyz ;
gl_Position = vp * m *vec4(pos, 1.0);
EmitVertex();
pos = gl_in[2].gl_Position.xyz;
gl_Position = vp * m * vec4(pos, 1.0);
EmitVertex();
EndPrimitive();
}
void main()
{
vp = proj * view;
mvp = proj * view * modelMatrix;
drawInstance(vec3(0,20,0));
// drawInstance(vec3(0,20,0));
// drawInstance(vec3(0,20,-40));
// drawInstance(vec3(40,40,0));
// drawInstance(vec3(-40,0,0));
}
I would recommend taking a different approach completely.
First, don't use geometry shaders for replicating geometry. That's what the glDrawArraysInstanced is for.
Second, it's hard to define such a matrix procedurally. This is related to the Hairy Ball Theorem.
Instead I would generate a bunch of random rotations on the CPU. Use this method to create a uniformly distributed quaternion. Pass that quaternion to the vertex shader as a single vec4 instanced attribute. In the vertex shader:
Offset the tree vertex by (0, 0, radiusOfThePlanet) so that it's located at the north pole (assuming Z-axis is up).
Apply the quaternion rotation (it will rotate around planet center so the tree stays on the surface).
Apply the planet model-view and camera projection matrices as usual.
This will yield an unbiased uniformly distributed random set of trees.
Found a solution to the problem which allows me to place objects on the surface of a sphere facing in the correct directions. Here is the code:
mat4 m = mat4(1);
vec3 worldPos = getWorldPoint(sphericalCoords);
//Add a random number to the world pos, then normalize it so that it is a point on a unit sphere slightly different to the world pos. The vector between them is a tangent. Change this value to rotate the object once placed on the sphere
vec3 xAxis = normalize(normalize(worldPos + vec3(0.0,0.2,0.0)) - normalize(worldPos));
//Planet is at 0,0,0 so world pos can be used as the normal, and therefore the y axis
vec3 yAxis = normalize(worldPos);
//We can cross the y and x axis to generate a bitangent to use as the z axis
vec3 zAxis = normalize(cross(yAxis, xAxis));
//This is our rotation matrix!
mat3 baseMat = mat3(xAxis, yAxis, zAxis);
//Fill this into our 4x4 matrix
m = mat4(baseMat);
//Transform m by the Radius in the y axis to put it on the surface
mat4 m2 = transformMatrix(mat4(1), vec3(0,radius,0));
m = m * m2;
//Multiply by the MVP to project correctly
m = mvp* m;
//Draw an instance of your object
drawInstance(m);

how can i iterate with loop in sampler2D

I have some data encoded in a floating point texture 2k by 2k. The data are longitude, latitude, time, and date as R,G,B,A. Those are all normalized but for now that is not a problem. I can de-normalize them later if i want to.
What i need now is to iterate through the whole texture and find what longitude, latitude should be in that fragment coordinate. I assume that the whole atlas has normalized coordinates and it maps the whole openGL context. Besides coordinates i will filter data with time and date but that is an if condition that is easy to be done. Because pixel coordinates that i have will not map exactly that coordinate i will use a small delta value to fix that issue for now and i will sue that delta value to precompute other points that are close to that co.
Now i have some driver crashes on iGPU (it should be out of memory or something similar) even if i want to add something in 2 for nested loops or even if I use a discard.
The code i now is this
NOTE f_time is the filter for the time and for now i have a slider so that i will have some interaction with the values.
precision mediump float;
precision mediump int;
const int maxTextureSize = 2048;
varying vec2 v_texCoord;
uniform sampler2D u_texture;
uniform float f_time;
uniform ivec2 textureDimensions;
void main(void) {
float delta = 0.001;// now bigger delta just to make it work then we tune it
// compute 1 pixel in texture coordinates.
vec2 onePixel = vec2(1.0, 1.0) / float(textureDimensions.x);
vec2 position = ( gl_FragCoord.xy / float(textureDimensions.x) );
vec4 color = texture2D(u_texture, v_texCoord);
vec4 outColor = vec4(0.0);
float dist_x = distance( color.r, gl_FragCoord.x);
float dist_y = distance( color.g, gl_FragCoord.y);
//float dist_x = distance( color.g, gl_PointCoord.s);
//float dist_y = distance( color.b, gl_PointCoord.t);
for(int i = 0; i < maxTextureSize; i++){
if(i < textureDimensions.x ){
break;
}
for(int j = 0; j < maxTextureSize ; j++){
if(j < textureDimensions.y ){
break;
}
// Where i am stuck now how to get the texture coordinate and test it with fragment shader
// the precomputation
vec4 pixel= texture2D(u_texture,vec2(i,j));
if(pixel.r > f_time){
outColor = vec4(1.0, 1.0, 1.0, 1.0);
// for now just break, no delta calculation to sum this point with others so that
// we will have an approximation of other points into that pixel
break;
}
}
}
// this works
if(color.t > f_time){
//gl_FragColor = color;//;vec4(1.0, 1.0, 1.0, 1.0);
}
gl_FragColor = outColor;
}
What you are trying to do is simply not feasible.
You are trying to access a texture up to four million times, all within a single fragment shader invocation.
The way modern GPUs usually detect infinite loop conditions is by seeing how long your shader runs, and then killing it if it has run for "too long", the length of which is usually sufficiently generous. Your code, which does up to 4 million texture accesses, will almost certainly trigger this condition.
Which typically leads to a GPU reset.
Generally speaking, the way you would find the position in a texture which is associated with some fragment is to do so directly. That is, create a 1:1 correspondence between screen fragment locations (gl_FragCoord) and texels in the texture. That way, your texture does not need to contain X/Y coordinates, and each fragment shader can access the data meant for that specific invocation.
What you're trying to do seems to be to pass a large table (four million elements) to the GPU, and then have the GPU process it. The ordering of values is (generally) irrelevant; any value could potentially modify any pixel. Some pixels don't have values applied to them, while others may have multiple values applied.
This is serial programmer thinking, not parallel thinking. The way you'd code that on the CPU is to walk each element in the table, look at where it goes, and build the results for each pixel.
In a parallel algorithm, you don't work that way. Each invocation needs to be able to instantly find the data in the table that applies to it. You should never be doing some kind of search through a table for your data. Especially not a linear search.
You need to think of this from the perspective of your fragment shader.
In your data table, for each position on the screen, there is a list of data values that apply to that screen position. Correct? What you need to do is make that list directly available to each fragment shader invocation. And since each fragment's list is not constant in size, you will need to use a linked list rather than a fixed-size array.
To do this, you build a texture the size of your render target. Each texel in the texture specifies the location in the data table of the first element that this fragment needs to process. This provides every fragment shader invocation with the location of its first element. Since some fragment shaders may have no data applied to them, you need to set aside some special texture coordinate value to represent "none".
The data in the data table consists of your time and date, but rather than "longitude/latitude", it has the texture coordinate of the next texel in the texture that applies for that fragment shader. This is how you make a linked list in shaders. Each location in the data table specifies the next location to be processed.
If that location was the last data to be processed, then the location will be the "none" value from before.
You should also be using a buffer texture or an SSBO to hold your data table, rather than a 2D texture. It would make things much easier.

Retrieve Vertices Data in THREE.js

I'm creating a mesh with a custom shader. Within the vertex shader I'm modifying the original position of the geometry vertices. Then I need to access to this new vertices position from outside the shader, how can I accomplish this?
In lieu of transform feedback (which WebGL 1.0 does not support), you will have to use a passthrough fragment shader and floating-point texture (this requires loading the extension OES_texture_float). That is the only approach to generate a vertex buffer on the GPU in WebGL. WebGL does not support pixel buffer objects either, so reading the output data back is going to be very inefficient.
Nevertheless, here is how you can accomplish this:
This will be a rough overview focusing on OpenGL rather than anything Three.js specific.
First, encode your vertex array this way (add a 4th component for index):
Vec4 pos_idx : xyz = Vertex Position, w = Vertex Index (0.0 through NumVerts-1.0)
Storing the vertex index as the w component is necessary because OpenGL ES 2.0 (WebGL 1.0) does not support gl_VertexID.
Next, you need a 2D floating-point texture:
MaxTexSize = Query GL_MAX_TEXTURE_SIZE
Width = MaxTexSize;
Height = min (NumVerts / MaxTexSize, 1);
Create an RGBA floating-point texture with those dimensions and use it as FBO color attachment 0.
Vertex Shader:
#version 100
attribute vec4 pos_idx;
uniform int width; // Width of floating-point texture
uniform int height; // Height of floating-point texture
varying vec4 vtx_out;
void main (void)
{
float idx = pos_idx.w;
// Position this vertex so that it occupies a unique pixel
vec2 xy_idx = vec2 (float ((int (idx) % width)) / float (width),
floor (idx / float (width)) / float (height)) * vec2 (2.0) - vec2 (1.0);
gl_Position = vec4 (xy_idx, 0.0f, 1.0f);
//
// Do all of your per-vertex calculations here, and output to vtx_out.xyz
//
// Store the index in the W component
vtx_out.w = idx;
}
Passthrough Fragment Shader:
#version 100
varying vec4 vtx_out;
void main (void)
{
gl_FragData [0] = vtx_out;
}
Draw and Read Back:
// Draw your entire vertex array for processing (as `GL_POINTS`)
glDrawArrays (GL_POINTS, 0, NumVerts);
// Bind the FBO's color attachment 0 to `GL_TEXTURE_2D`
// Read the texture back and store its results in an array `verts`
glGetTexImage (GL_TEXTURE_2D, 0, GL_RGBA, GL_FLOAT, verts);

Repeat texture like stipple

I'm using orthographic projection.
I have 2 triangles creating one long quad.
On this quad i put a texture that repeat him self along the the way.
The world zoom is always changing by the user - and makes the quad length be short or long accordingly. The height is being calculated in the shader so it is always the same size (in pixels).
My problem is that i want the texture to repeat according to it's real (pixel size) and the length of the quad. In other words, that the texture will be always the same size (pixels) and it will fill the quad by repeating it more or less depend on the quad length.
The rotation is important.
For Example
My texture is
I've added to my vertices - texture coordinates for duplicating it 20 times now
as you see below
Because it's too much zoomed out we see the texture squeezed.
Now i'm zooming in and the texture stretched. It will always be 20 times repeat.
I'm sure that i have to play in with the texture coordinates in the frag shader, but don't see the solution. or perhaps there is a better solution to my problem.
---- ADDITION ----
Solved it by:
Calculating the repeat S value in the current zoom (That i'm adding the vertices) and send the map width (in world values) as attribute. Every draw i'm sending the current map width as uniform for calculating the scale.
But i'm not happy with this solution.
OK, found a way to do it with minimum attributes and minimum code in the shader.
Do Once:
Calculating the the repeat count for each line as my world and my screen are 1:1 - 1 in my world is 1 pixel. LineDistance(InWorldUnits)/picWidth(inScreenUnits)
Saving as an attribute.
Every Draw:
Calculating the scale - world to screen - WorldWidth/ScreenWidth
Setting as uniform
Drawing the buffer
In the frag shader
simply multiply this scale with the repeat attribute.
Works perfectly and looks good. Resizing the window is supported as well.
The general solution is to include a texture matrix. So your vertex shader might look something like
attribute vec4 a_position;
attribute vec2 a_texcoord;
varying vec2 v_texcoord;
uniform mat4 u_matrix;
uniform mat4 u_texMatrix;
void main() {
gl_Position = u_matrix * a_position;
v_texcoord = (u_texMatrix * v_texcoord).xy;
}
Now you can set up texture matrix to scale your texture coordinates however you need. If your texture coordinates go from 0 to 1 across the texture and your pattern is 16 pixels wide then if you're drawing a line 100 pixels long you'd need 100/16 as your X scale.
var pixelsLong = 100;
var pixelsTall = 8;
var textureWidth = 16;
var textureHeight = 16;
var xScale = pixelsLong / textureWidth;
var yScale = pixelsTall / textureHeight;
var texMatrix = [
xScale, 0, 0, 0,
0, yScale, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1,
];
gl.uniformMatrix4fv(texMatrixLocation, false, texMatrix);
That seems like it would work. Because you're using a matrix you can also easily offset or rotate the texture. See matrix math

How do I instance in a geometry shader in DirectX11?

I know that in the vertex shader you do it like this:
PixelInputType TextureVertexShader(VertexInputType input)
{
PixelInputType output;
// Change the position vector to be 4 units for proper matrix calculations.
input.position.w = 1.0f;
// Update the position of the vertices based on the data for this particular instance.
input.position.x += input.instancePosition.x;
input.position.y += input.instancePosition.y;
input.position.z += input.instancePosition.z;
// Calculate the position of the vertex against the world, view, and projection matrices.
output.position = mul(input.position, worldMatrix);
output.position = mul(output.position, viewMatrix);
output.position = mul(output.position, projectionMatrix);
// Store the texture coordinates for the pixel shader.
output.tex = input.tex;
return output;
}
What would be the equivalent for using instancedPosition in a geometry shader?Like when I wanna instance a model made of 1 vertex and for each instance to make a quad in the geometry shader and set the quad's position to be that of the instancePosition of the corresponding instance in the instance buffer.
In order to pass the data in your geometry shader, you can simply pass in through from the VS to the GS, so your VS output structure would be like:
struct GS_INPUT
{
float4 vertexposition :POSITION;
float4 instanceposition : TEXCOORD0; //Choose any semantic you like here
float2 tex : TEXCOORD1;
//Add in any other relevant data your geometry shader is gonna need
};
then in your vertex shader, pass trough the data directly in your geometry shader (untransformed)
After depending on your input primitive topology your geometry shader will receive the data either as point, line or full triangle, since you mention you want to convert point to quad, prototype would look like this
[maxvertexcount(4)]
void GS(point GS_INPUT input[1], inout TriangleStream<PixelInputType> SpriteStream)
{
PixelInputType output;
//Prepare your 4 vertices to make a quad, transform them and once they ready, use
//SpriteStream.Append(output); to emit them, they are triangle strips,
//So if you need a brand new triangle,
// you can also use SpriteStream.RestartStrip();
}

Resources