Is there any way to enforce a 16 byte alignment for a uniform buffer in GLSL? - opengl-es

I'm targeting WebGL via wgpu and am running into an issue with uniform buffer alignment.
I am trying to use this uniform:
layout(set=0, binding=2, std140)
uniform TexSize {
ivec2 dimensions;
};
And I get an error BUFFER_BINDINGS_NOT_16_BYTE_ALIGNED.
Checking with the maintainers of wgpu, I was informed this was because of the flavor of GLSL used by WebGPU, and that the uniform buffer in my shader must be 16-byte-aligned.
I can solve this by padding the struct out to have a 16 byte alignment:
layout(set=0, binding=2, std140)
uniform TexSize {
ivec2 dimensions;
ivec2 padding;
};
But this seems rather inelegant. Is there any way to set the alignment of TexSize without just adding other members to pad it out?

No.
uniforms require 16 byte (4 float) spacing. you need to add padding.
Address Space Layout Constraints of wgsl specification: https://www.w3.org/TR/WGSL/#address-space-layout-constraints
GPUs need to do allot of 3 or 4 Dimensional Matrix arithmetic on 16Byte floats. This is why GPUs are heavily optimised for this type of Calculations. And this is why The Spacing of uniforms is 4 X 16Byte Float.

Related

GLSL: memory exhausted

I am working on a WebGL scene with ~100 different 2048 x 2048 px textures. I'm rendering points primitives, and each point has a texture index and texture uv offsets that indicate the region of the given texture that should be used on the point.
Initially, I attempted to pass each point's texture index as a varying value, then I attempted to pull the given texture from a sampler2D array using that index position. However, this yielded an error that one can only fetch sampler2D array values with a "constant integer expression", so now I'm using a gnarly if conditional to assign each point's texture index:
/**
* The fragment shader's main() function must define `gl_FragColor`,
* which describes the pixel color of each pixel on the screen.
*
* To do so, we can use uniforms passed into the shader and varyings
* passed from the vertex shader.
*
* Attempting to read a varying not generated by the vertex shader will
* throw a warning but won't prevent shader compiling.
**/
// set float precision
precision highp float;
// repeat identifies the size of each image in an atlas
uniform vec2 repeat;
// textures contains an array of textures with length n textures
uniform sampler2D textures[42];
// identify the uv values as a varying attribute
varying vec2 vUv; // blueprint uv coords
varying vec2 vTexOffset; // instance uv offsets
varying float vTexture; // set index of each object's vertex
void main() {
int textureIndex = int(floor(vTexture));
vec2 uv = vec2( gl_PointCoord.x, 1.0 - gl_PointCoord.y );
// The block below is automatically generated
if (textureIndex == 0) {vec4 color = texture2D(textures[0], uv * repeat + vTexOffset ); }
else if (textureIndex == 1) { vec4 color = texture2D(textures[1], uv * repeat + vTexOffset ); }
else if (textureIndex == 2) { vec4 color = texture2D(textures[2], uv * repeat + vTexOffset ); }
else if (textureIndex == 3) { vec4 color = texture2D(textures[3], uv * repeat + vTexOffset ); }
[ more lines of the same ... ]
gl_FragColor = color;
}
If the number of textures is small, this works fine. But if the number of textures is large (e.g. 40) this approach throws:
ERROR: 0:58: '[' : memory exhausted
I've tried reading around on this error but still am not sure what it means. Have I surpassed the max RAM in the GPU? If anyone knows what this error means, and/or what I can do to resolve the problem, I'd be grateful for any tips you can provide.
More details:
Total size of all textures to be loaded: 58MB
Browser: recent Chrome
Graphics card: AMD Radeon R9 M370X 2048 MB graphics (stock 2015 OSX card)
There is a limit on how many samplers a fragment shader can access. It can be obtained via gl.getParameter(gl.MAX_TEXTURE_IMAGE_UNITS). It is guaranteed to be at least 8, and is typically 16 or 32.
To circumvent the limit, texture arrays are available in WebGL2, which also allow indexing layers with any variable. In WebGL1 your only option are atlases, but since your textures are already 2048 by 2048, you can't make ghem any bigger.
If you don't want to limit yourself to WebGL2, you would have to split your rendering into multiple draw calls with diffferent textures set.
Also consider that having 100 8-bit RGBA 2048x2048 textures uses up 1.6 gigabytes of VRAM. Texture compression via WEBGL_compressed_texture_s3tc can reduce that by 8x or 4x, depending on how much alpha precision you need.

how can i iterate with loop in sampler2D

I have some data encoded in a floating point texture 2k by 2k. The data are longitude, latitude, time, and date as R,G,B,A. Those are all normalized but for now that is not a problem. I can de-normalize them later if i want to.
What i need now is to iterate through the whole texture and find what longitude, latitude should be in that fragment coordinate. I assume that the whole atlas has normalized coordinates and it maps the whole openGL context. Besides coordinates i will filter data with time and date but that is an if condition that is easy to be done. Because pixel coordinates that i have will not map exactly that coordinate i will use a small delta value to fix that issue for now and i will sue that delta value to precompute other points that are close to that co.
Now i have some driver crashes on iGPU (it should be out of memory or something similar) even if i want to add something in 2 for nested loops or even if I use a discard.
The code i now is this
NOTE f_time is the filter for the time and for now i have a slider so that i will have some interaction with the values.
precision mediump float;
precision mediump int;
const int maxTextureSize = 2048;
varying vec2 v_texCoord;
uniform sampler2D u_texture;
uniform float f_time;
uniform ivec2 textureDimensions;
void main(void) {
float delta = 0.001;// now bigger delta just to make it work then we tune it
// compute 1 pixel in texture coordinates.
vec2 onePixel = vec2(1.0, 1.0) / float(textureDimensions.x);
vec2 position = ( gl_FragCoord.xy / float(textureDimensions.x) );
vec4 color = texture2D(u_texture, v_texCoord);
vec4 outColor = vec4(0.0);
float dist_x = distance( color.r, gl_FragCoord.x);
float dist_y = distance( color.g, gl_FragCoord.y);
//float dist_x = distance( color.g, gl_PointCoord.s);
//float dist_y = distance( color.b, gl_PointCoord.t);
for(int i = 0; i < maxTextureSize; i++){
if(i < textureDimensions.x ){
break;
}
for(int j = 0; j < maxTextureSize ; j++){
if(j < textureDimensions.y ){
break;
}
// Where i am stuck now how to get the texture coordinate and test it with fragment shader
// the precomputation
vec4 pixel= texture2D(u_texture,vec2(i,j));
if(pixel.r > f_time){
outColor = vec4(1.0, 1.0, 1.0, 1.0);
// for now just break, no delta calculation to sum this point with others so that
// we will have an approximation of other points into that pixel
break;
}
}
}
// this works
if(color.t > f_time){
//gl_FragColor = color;//;vec4(1.0, 1.0, 1.0, 1.0);
}
gl_FragColor = outColor;
}
What you are trying to do is simply not feasible.
You are trying to access a texture up to four million times, all within a single fragment shader invocation.
The way modern GPUs usually detect infinite loop conditions is by seeing how long your shader runs, and then killing it if it has run for "too long", the length of which is usually sufficiently generous. Your code, which does up to 4 million texture accesses, will almost certainly trigger this condition.
Which typically leads to a GPU reset.
Generally speaking, the way you would find the position in a texture which is associated with some fragment is to do so directly. That is, create a 1:1 correspondence between screen fragment locations (gl_FragCoord) and texels in the texture. That way, your texture does not need to contain X/Y coordinates, and each fragment shader can access the data meant for that specific invocation.
What you're trying to do seems to be to pass a large table (four million elements) to the GPU, and then have the GPU process it. The ordering of values is (generally) irrelevant; any value could potentially modify any pixel. Some pixels don't have values applied to them, while others may have multiple values applied.
This is serial programmer thinking, not parallel thinking. The way you'd code that on the CPU is to walk each element in the table, look at where it goes, and build the results for each pixel.
In a parallel algorithm, you don't work that way. Each invocation needs to be able to instantly find the data in the table that applies to it. You should never be doing some kind of search through a table for your data. Especially not a linear search.
You need to think of this from the perspective of your fragment shader.
In your data table, for each position on the screen, there is a list of data values that apply to that screen position. Correct? What you need to do is make that list directly available to each fragment shader invocation. And since each fragment's list is not constant in size, you will need to use a linked list rather than a fixed-size array.
To do this, you build a texture the size of your render target. Each texel in the texture specifies the location in the data table of the first element that this fragment needs to process. This provides every fragment shader invocation with the location of its first element. Since some fragment shaders may have no data applied to them, you need to set aside some special texture coordinate value to represent "none".
The data in the data table consists of your time and date, but rather than "longitude/latitude", it has the texture coordinate of the next texel in the texture that applies for that fragment shader. This is how you make a linked list in shaders. Each location in the data table specifies the next location to be processed.
If that location was the last data to be processed, then the location will be the "none" value from before.
You should also be using a buffer texture or an SSBO to hold your data table, rather than a 2D texture. It would make things much easier.

opengl artifacts using soubroutine

I draw my scene using glDrawElements function.
Since I want to achieve situation, where one draw call draws complete scene,
I need to make shader which switches between "materials" in shader.
I decided to use soubroutine for materials. Here is my fragment shader.
#version 440
layout(location = 0) flat in uvec2 inID_ShaderData;
layout(location = 1) in vec4 inPosition;
layout(location = 2) in vec2 inUV;
subroutine void shaderType(void);
subroutine uniform shaderType shaders[2];
uniform sampler2D texture0;
layout(location = 0) out vec4 outColor;
layout(location = 1) flat out uint outID;
subroutine(shaderType) void shader_flatColor(void)
{
outColor = vec4(1,0,0,1); // test red color
// outColor = unpackUnorm4x8(inID_ShaderData.y & 0x00ffffff); // this should be here normally
}
subroutine(shaderType) void shader_flatTexture(void)
{
outColor = vec4(0,0,1,1); // test blue color
// outColor = texture(texture0, inUV); // this should be here normally
}
void main()
{
uint shader = (inID_ShaderData.y >> 24) & 0xff; // extract subroutine index from attributes
shaders[ shader ](); // call subroutine - not working, makes artifacts
/* calling subroutine this way works ok
if (shader == 0) shaders[ 0 ]();
if (shader == 1) shaders[ 1 ]();
*/
outID = inID_ShaderData.x;
if (outID == -1) // this condition never happens
outColor = texture(texture0, inUV); // needed here to not to optimize out texture0, needed in soubroutine
}
Question 1:
When using shaders [ shader ] (); then there are pixel artifacts on quads drawn.
When using IFs, then it works OK. Is it driver bug, or am I doing something wrong?
How can this be achieved without IFs, using subroutines ?
(I have Radeon 7850 on Windows 8 64 bit)
Question2:
In second soubroutine I want to use texture. But if I don't use this sampler variable
in main(), then compiler "does not see it" in subroutine and on cpu-side glUniform
function fails.
Is there some way how to do right? Without compiler cheats, e.g. never happening conditions?
P.S.: Sorry I can not post image with artifacts, but what should be red squares are
red squares with random blue pixels on some 5% of area mostly in corners.
Blue squares have red pixels.
You simply cannot do that. Switching subroutines is not possible on a per-attribute basis. The GLSL spec has this to say about your attempt:
Subroutine variables may be declared as explicitly-sized arrays, which
can be indexed only with dynamically uniform expressions.
Thinking about how GPUs work, this restriction does make total sense. There simply is no separate control flow for every single shader invocation, but only for much larger groups.
When you use the if attempt, you are using constant indices, which of course are also dynamically uniform. You of course could try to brach based on your input attribute, but you should be aware that forcing non-uniform control flow that way might totally ruin performance. In the worst case, this will not significantly more efficient than running all executing all the subroutine functions all the time, storing the result in some array, and select the final result from that array using your index.

Storing floats in a texture in OpenGL ES

In WebGL, I am trying to create a texture with texels each consisting of 4 float values. Here I attempt to to create a simple texture with one vec4 in it.
var textureData = new Float32Array(4);
var texture = gl.createTexture();
gl.activeTexture( gl.TEXTURE0 );
gl.bindTexture(gl.TEXTURE_2D, texture);
gl.texImage2D(
// target, level, internal format, width, height
gl.TEXTURE_2D, 0, gl.RGBA, 1, 1,
// border, data format, data type, pixels
0, gl.RGBA, gl.FLOAT, textureData
);
My intent is to sample it in the shader using a sampler like so:
uniform sampler2D data;
...
vec4 retrieved = texture2D(data, vec2(0.0, 0.0));
However, I am getting an error during gl.texImage2D:
WebGL: INVALID_ENUM: texImage2D: invalid texture type
WebGL error INVALID_ENUM in texImage2D(TEXTURE_2D, 0, RGBA, 1, 1, 0, RGBA, FLOAT,
[object Float32Array])
Comparing the OpenGL ES spec and the OpenGL 3.3 spec for texImage2D, it seems like I am not allowed to use gl.FLOAT. In that case, how would I accomplish what I am trying to do?
You can create a byte array from your float array. Each float should take 4bytes (32bit float). This array can be put into texture using a standard RGBA format with unsigned byte. This will create a texture where each texel contains a single 32bit floating number which seems to be exactly what you want.
The only problem is your floating value is split into 4 floating values when you retrieve it from texture in your fragment shader. So what you are looking for is most likely "how to convert vec4 into a single float".
You should note what you are trying to do with internal format being RGBA consisting of 32bit floats will not work as your texture will always be 32bit per texel so even somehow forcing floats into a texture should result into clamping or precision loss. And then even if the texture texel would consist of 4 RGBA 32bit floats your shader would most likely treat them as lowp using texture2D at some point.
The solution to my problem is actually quite simple! I just needed to type
var float_texture_ext = gl.getExtension('OES_texture_float');
Now WebGL can use texture floats!
This MDN page tells us why:
Note: In WebGL, unlike in other GL APIs, extensions are only available if explicitly requested.

Packing float into vec4 - how does this code work?

I am trying to study shadow mapping in WebGL. I see same piece of shader code copied in various libraries and examples that achieve this. However nowhere did I find the explanation of how it works.
The idea is to save a depth value (a single float) into the color buffer (vec4). There is a pack function that saves float to vec4 and unpack function that retrieves the float from vec4.
vec4 pack_depth(const in float depth)
{
const vec4 bit_shift = vec4(256.0*256.0*256.0, 256.0*256.0, 256.0, 1.0);
const vec4 bit_mask = vec4(0.0, 1.0/256.0, 1.0/256.0, 1.0/256.0);
vec4 res = fract(depth * bit_shift);
res -= res.xxyz * bit_mask;
return res;
}
float unpack_depth(const in vec4 rgba_depth)
{
const vec4 bit_shift = vec4(1.0/(256.0*256.0*256.0), 1.0/(256.0*256.0), 1.0/256.0, 1.0);
float depth = dot(rgba_depth, bit_shift);
return depth;
}
I would have imagined that packing a float into vec4 should be a trivial problem, just copy it into one of the 4 slots of vec4 and leave others unused. That's why the bit shifting logic in above code is puzzling to me.
Can anyone shed some light?
It's not storing a GLSL float in a GLSL vec4. What it's doing is storing a value in a vec4 which, when written to an RGBA8 framebuffer (32-bit value) can be read as a vec4 and then reconstituted into the same float that was given previously.
If you did what you suggest, just writing the floating-point value to the red channel of the framebuffer, you'd only get 8 bits of accuracy. With this method, you get all 32-bits working for you.
If you're interested into the nitty-gritty details of how these routines work I suggest you to read my blog post. I'm adding some details here on how that code works and to address some possible use cases.
As you probably figured out, that code is encoding a normalized float value to a vec4. OpenGL ES 2.0 or WebGL (at the time of writing), might make use of those pack/unpack routines to provide 32 bit precision floating points via RGBA8 textures (more on this in the spec).
Even with the extension posted by Mikael (OES_texture_float) it might be necessary (for debugging purposes for instance) to dump full 32 bit precision normalized floating points and as described in the spec readPixels is currently limited by the following
Only two combinations of format and type are accepted. The first is format RGBA and type UNSIGNED_BYTE. The second is an implementation-chosen format.
In addition to the answer above, you might be interested in the floating point texture extension described here:
http://www.khronos.org/registry/webgl/extensions/OES_texture_float/
Note that there are hardware/software setups out there where this extension doesn't exist/run, but if it do it sure is a good extension. My experience is that it's fast as well. If you use this, you can use the remaining three channels to store other information, such as color from a projected texture.

Resources