how can i iterate with loop in sampler2D - data-structures

I have some data encoded in a floating point texture 2k by 2k. The data are longitude, latitude, time, and date as R,G,B,A. Those are all normalized but for now that is not a problem. I can de-normalize them later if i want to.
What i need now is to iterate through the whole texture and find what longitude, latitude should be in that fragment coordinate. I assume that the whole atlas has normalized coordinates and it maps the whole openGL context. Besides coordinates i will filter data with time and date but that is an if condition that is easy to be done. Because pixel coordinates that i have will not map exactly that coordinate i will use a small delta value to fix that issue for now and i will sue that delta value to precompute other points that are close to that co.
Now i have some driver crashes on iGPU (it should be out of memory or something similar) even if i want to add something in 2 for nested loops or even if I use a discard.
The code i now is this
NOTE f_time is the filter for the time and for now i have a slider so that i will have some interaction with the values.
precision mediump float;
precision mediump int;
const int maxTextureSize = 2048;
varying vec2 v_texCoord;
uniform sampler2D u_texture;
uniform float f_time;
uniform ivec2 textureDimensions;
void main(void) {
float delta = 0.001;// now bigger delta just to make it work then we tune it
// compute 1 pixel in texture coordinates.
vec2 onePixel = vec2(1.0, 1.0) / float(textureDimensions.x);
vec2 position = ( gl_FragCoord.xy / float(textureDimensions.x) );
vec4 color = texture2D(u_texture, v_texCoord);
vec4 outColor = vec4(0.0);
float dist_x = distance( color.r, gl_FragCoord.x);
float dist_y = distance( color.g, gl_FragCoord.y);
//float dist_x = distance( color.g, gl_PointCoord.s);
//float dist_y = distance( color.b, gl_PointCoord.t);
for(int i = 0; i < maxTextureSize; i++){
if(i < textureDimensions.x ){
break;
}
for(int j = 0; j < maxTextureSize ; j++){
if(j < textureDimensions.y ){
break;
}
// Where i am stuck now how to get the texture coordinate and test it with fragment shader
// the precomputation
vec4 pixel= texture2D(u_texture,vec2(i,j));
if(pixel.r > f_time){
outColor = vec4(1.0, 1.0, 1.0, 1.0);
// for now just break, no delta calculation to sum this point with others so that
// we will have an approximation of other points into that pixel
break;
}
}
}
// this works
if(color.t > f_time){
//gl_FragColor = color;//;vec4(1.0, 1.0, 1.0, 1.0);
}
gl_FragColor = outColor;
}

What you are trying to do is simply not feasible.
You are trying to access a texture up to four million times, all within a single fragment shader invocation.
The way modern GPUs usually detect infinite loop conditions is by seeing how long your shader runs, and then killing it if it has run for "too long", the length of which is usually sufficiently generous. Your code, which does up to 4 million texture accesses, will almost certainly trigger this condition.
Which typically leads to a GPU reset.
Generally speaking, the way you would find the position in a texture which is associated with some fragment is to do so directly. That is, create a 1:1 correspondence between screen fragment locations (gl_FragCoord) and texels in the texture. That way, your texture does not need to contain X/Y coordinates, and each fragment shader can access the data meant for that specific invocation.
What you're trying to do seems to be to pass a large table (four million elements) to the GPU, and then have the GPU process it. The ordering of values is (generally) irrelevant; any value could potentially modify any pixel. Some pixels don't have values applied to them, while others may have multiple values applied.
This is serial programmer thinking, not parallel thinking. The way you'd code that on the CPU is to walk each element in the table, look at where it goes, and build the results for each pixel.
In a parallel algorithm, you don't work that way. Each invocation needs to be able to instantly find the data in the table that applies to it. You should never be doing some kind of search through a table for your data. Especially not a linear search.
You need to think of this from the perspective of your fragment shader.
In your data table, for each position on the screen, there is a list of data values that apply to that screen position. Correct? What you need to do is make that list directly available to each fragment shader invocation. And since each fragment's list is not constant in size, you will need to use a linked list rather than a fixed-size array.
To do this, you build a texture the size of your render target. Each texel in the texture specifies the location in the data table of the first element that this fragment needs to process. This provides every fragment shader invocation with the location of its first element. Since some fragment shaders may have no data applied to them, you need to set aside some special texture coordinate value to represent "none".
The data in the data table consists of your time and date, but rather than "longitude/latitude", it has the texture coordinate of the next texel in the texture that applies for that fragment shader. This is how you make a linked list in shaders. Each location in the data table specifies the next location to be processed.
If that location was the last data to be processed, then the location will be the "none" value from before.
You should also be using a buffer texture or an SSBO to hold your data table, rather than a 2D texture. It would make things much easier.

Related

Rendering to custom FrameBuffer using same texture both as input and output

Some Fragment shaders in ShaderToy (e.g. fluid dynamics, https://www.shadertoy.com/view/4tGfDW ) use same buffer as both input and output. But when I try to do this in my C/C++ code it does not work (I renders strange checkerboard artifacts like inconsistent visual memory). To workaround this issue I have to use two different FrameBuffers A,B and flip textures ( first render A to B then render B back to A )
I understand that OpenGL does not allow to use the same texture both as input and output (?) due to memory consistency issues.
But isn't there more elegant solution than using two FrameBuffers ? E.g. using some lock, or temporary cache (I don't know some sychronization flag which takes care of this)???
EDIT - Details to answer the comment/question:
OpenGL (depending the GL version) has some very specific rules of what
can and can''t be done when the same texture is used as render target
and sampler input. If your use case can be implemented within this set
of requirements or not is not clear, as you have not explained what
exactly you need or want to do here.
basically I want to implement Fluid-Dynamics solver (e.g. that from ShaderToy linked above) as well as other partial differential equation solvers. That means each pixel output depends on some convolution mask (derivative, laplacian, average) of neighboring pixels. There may be also some movement (advection) which means reading values form distant pixels.
Currently I realized the artifacts appear mostly when I read/write pixels which are different place - i.e. it is non-local (e.g. pixel[100,100] depend on pixel[10,10])
Example of simple Fluid-Solver from Shadertoy:
vec4 solveFluid(sampler2D smp, vec2 uv, vec2 w, float time, vec3 mouse, vec3 lastMouse)
{
const float K = 0.2;
const float v = 0.55;
vec4 data = textureLod(smp, uv, 0.0);
vec4 tr = textureLod(smp, uv + vec2(w.x , 0), 0.0);
vec4 tl = textureLod(smp, uv - vec2(w.x , 0), 0.0);
vec4 tu = textureLod(smp, uv + vec2(0 , w.y), 0.0);
vec4 td = textureLod(smp, uv - vec2(0 , w.y), 0.0);
vec3 dx = (tr.xyz - tl.xyz)*0.5;
vec3 dy = (tu.xyz - td.xyz)*0.5;
vec2 densDif = vec2(dx.z ,dy.z);
data.z -= dt*dot(vec3(densDif, dx.x + dy.y) ,data.xyz); //density
vec2 laplacian = tu.xy + td.xy + tr.xy + tl.xy - 4.0*data.xy;
vec2 viscForce = vec2(v)*laplacian;
data.xyw = textureLod(smp, uv - dt*data.xy*w, 0.).xyw; //advection
vec2 newForce = vec2(0);
data.xy += dt*(viscForce.xy - K/dt*densDif + newForce); //update velocity
data.xy = max(vec2(0), abs(data.xy)-1e-4)*sign(data.xy); //linear velocity decay
#ifdef USE_VORTICITY_CONFINEMENT
data.w = (tr.y - tl.y - tu.x + td.x);
vec2 vort = vec2(abs(tu.w) - abs(td.w), abs(tl.w) - abs(tr.w));
vort *= VORTICITY_AMOUNT/length(vort + 1e-9)*data.w;
data.xy += vort;
#endif
data.y *= smoothstep(.5,.48,abs(uv.y-0.5)); //Boundaries
data = clamp(data, vec4(vec2(-10), 0.5 , -10.), vec4(vec2(10), 3.0 , 10.));
return data;
}
Currently I realized the artifacts appear mostly when I read/write pixels which are different place - i.e. it is non-local (e.g. pixel[100,100] depend on pixel[10,10])
Yes, this is never going to work on GPUs, as there are no particular guarantees on the order of individual fragment shader invocations whatsoever. So if the invocation writing to pixel [100,100] will see the results of the invocation writing to [10,10] or the original data will be totally random. As per the spec, you're getting undefined values when reading in such a cuncurrent read/write scenario, so theoretically, you could get even not one or the other, but see partial writes or totally different values (although that's not likely to occur on real world hardware).
And any order guarantees of such a scale simply does not make sense within the render pipeline, so there is also no partical means of synchronization you can manually add to solve this issue.
To workaround this issue I have to use two different FrameBuffers A,B and flip textures ( first render A to B then render B back to A )
Yes, the ping-pong approach is what you should do for this use case. And honestly, it should not incur any significant performance penalty in that scenario anyway, as you seem to write to each output pixel once anyway, so you don't need an additional copy of "untouched" pixels. So all it costs is the additional memory.

Finding the size of a screen pixel in UV coordinates for use by the fragment shader

I've got a very detailed texture (with false color information I'm rendering with a false-color lookup in the fragment shader). My problem is that sometimes the user will zoom far away from this texture, and the fine detail will be lost: fine lines in the texture can't be seen. I would like to modify my code to make these lines pop out.
My thinking is that I can run fast filter over neighboring textels and pick out the biggest/smallest/most interesting value to render. What I'm not sure how to do is to find out if (and how much) to do this. When the user is zoomed into a triangle, I want the standard lookup. When they are zoomed out, a single pixel on the screen maps to many texture pixels.
How do I get an estimate of this? I am doing this with both orthogographic and perspective cameras.
My thinking is that I could somehow use the vertex shader to get an estimate of how big one screen pixel is in UV space and pass that as a varying to the fragment shader, but I still don't have a solid grasp on either the transforms and spaces enough to get the idea.
My current vertex shader is quite simple:
varying vec2 vUv;
varying vec3 vPosition;
varying vec3 vNormal;
varying vec3 vViewDirection;
void main() {
vUv = uv;
vec4 mvPosition = modelViewMatrix * vec4( position, 1.0 );
vPosition = (modelMatrix *
vec4(position,1.0)).xyz;
gl_Position = projectionMatrix * mvPosition;
vec3 transformedNormal = normalMatrix * vec3( normal );
vNormal = normalize( transformedNormal );
vViewDirection = normalize(mvPosition.xyz);
}
How do I get something like vDeltaUV, which gives the distance between screen pixels in UV units?
Constraints: I'm working in WebGL, inside three.js.
Here is an example of one image, where the user has zoomed perspective in close to my texture:
Here is the same example, but zoomed out; the feature above is a barely-perceptible diagonal line near the center (see the coordinates to get a sense of scale). I want this line to pop out by rendering all pixels with the red-est color of the corresponding array of textels.
Addendum (re LJ's comment)...
No, I don't think mipmapping will do what I want here, for two reasons.
First, I'm not actually mapping the texture; that is, I'm doing something like this:
gl_FragColor = texture2D(mappingtexture, texture2d(vec2(inputtexture.g,inputtexture.r))
The user dynamically creates the mappingtexture, which allows me to vary the false-color map in realtime. I think it's actually a very elegant solution to my application.
Second, I don't want to draw the AVERAGE value of neighboring pixels (i.e. smoothing) I want the most EXTREME value of neighboring pixels (i.e. something more akin to edge finding). "Extreme" in this case is technically defined by my encoding of the g/r color values in the input texture.
Solution:
Thanks to the answer below, I've now got a working solution.
In my javascript code, I had to add:
extensions: {derivatives: true}
to my declaration of the ShaderMaterial. Then in my fragment shader:
float dUdx = dFdx(vUv.x); // Difference in U between this pixel and the one to the right.
float dUdy = dFdy(vUv.x); // Difference in U between this pixel and the one to the above.
float dU = sqrt(dUdx*dUdx + dUdy*dUdy);
float pixel_ratio = (dU*(uInputTextureResolution));
This allows me to do things like this:
float x = ... the u coordinate in pixels in the input texture
float y = ... the v coordinate in pixels in the input texture
vec4 inc = get_encoded_adc_value(x,y);
// Extremum mapping:
if(pixel_ratio>2.0) {
inc = most_extreme_value(inc, get_encoded_adc_value(x+1.0, y));
}
if(pixel_ratio>3.0) {
inc = most_extreme_value(inc, get_encoded_adc_value(x-1.0, y));
}
The effect is subtle, but definitely there! The lines pop much more clearly.
Thanks for the help!
You can't do this in the vertex shader as it's pre-rasterization stage hence output resolution agnostic, but in the fragment shader you could use dFdx, dFdy and fwidth using the GL_OES_standard_derivatives extension(which is available pretty much everywhere) to estimate the sampling footprint.
If you're not updating the texture in realtime a simpler and more efficient solution would be to generate custom mip levels for it on the CPU.

Clustering objects in GPU

My algorithm is simple for clustering, and it goes like this.
First object is grouped by all other objects which the distance between them is lower the X.
Then we go to the second object, if not included in the first group, we run the same algorithm on the other objects that are not included in the first group,
and so on...
I'm trying to do this algo in the GPU using the fragment shader.
First I set all the locations into a RGBA float texture. Setting for each pixel the location (x,y) - z and w are free for now. Then i draw to a result texture my calculations using the shader. In the end i will read the pixels of the result texture and do my code.
Tried many variations of code, and multi phases draw for performing my algorithm but i'm not happy with the time performances.
The question is,
Is there a way to do one run over the texture to perform my wish (single draw phase) ?
My latest try is this algorithm - My fragment shader
precision highp float;
uniform sampler2D locs;
varying vec2 coord;
uniform float clusterDistance;
const float textureSize = 64.;
void main()
{
// Getting my location
vec4 currData = texture2D(locs, coord);
float offsetPix = 1./textureSize/2.;
vec2 coordIdx = (coord - offsetPix) * textureSize;
// Getting the index of my location
float myIdx = coordIdx.y * textureSize + coordIdx.x;
int clusterIdx = 0;
float clusterNum = 0.;
// Running over all the other locations until me and finding the first close object to me
for (float i=0.;i<textureSize*textureSize;++i)
{
clusterNum = i +1.;
// Which mean that we didn't find any closed object to me so we stop
if (i == myIdx)
{
break;
}
else
{
vec2 pntLoc = vec2(mod(i, textureSize), floor(i/textureSize)) / textureSize+offsetPix;
vec4 pnt = texture2D(locs, pntLoc);
if (distance(currData.xy, pnt.xy) <= clusterDistance)
{
break;
}
}
}
// Print the result
gl_FragColor = vec4(currData.x, currData.y, clusterNum, 1.);
}
But the problem here is that the result can cause a chain clustering. For ex.
if our data is {0,0}, {4,0}, {8,0}, and the max distance to group is 4. Then the first is closed to the second. and then the third is close to the second but not the first. according to my algo, it is returning the index of the second, although that second is out of the picture because is grouped by the first object, and the first is the reference object for distances.
Is it possible to read from the result texture while writing to it?
It would solve my problem, cause then i could check the z value of the result when comparing distances..
No, you cannot read and write to a texture in the same pass (with standard WebGL and I think not at all in the way you intend).
Your algorithm seems rather serial in nature, not well suited for GPU/SIMD execution, but I may misinterpret your intent. Remember that the GPU may run a shader program for multiple data-points (fragments/pixels in this case) at once, having no clue about the results of others.
You also can't break out of a for loop on a SIMD architecture. The for loop will just keep iterating although the changes will not be written for fragments that broke out of it. In other words there is no speed benefit. It's a different story if the break condition evaluates to the same value for all fragments.
You might want to look at other ways of clustering, like k-means.

opengl artifacts using soubroutine

I draw my scene using glDrawElements function.
Since I want to achieve situation, where one draw call draws complete scene,
I need to make shader which switches between "materials" in shader.
I decided to use soubroutine for materials. Here is my fragment shader.
#version 440
layout(location = 0) flat in uvec2 inID_ShaderData;
layout(location = 1) in vec4 inPosition;
layout(location = 2) in vec2 inUV;
subroutine void shaderType(void);
subroutine uniform shaderType shaders[2];
uniform sampler2D texture0;
layout(location = 0) out vec4 outColor;
layout(location = 1) flat out uint outID;
subroutine(shaderType) void shader_flatColor(void)
{
outColor = vec4(1,0,0,1); // test red color
// outColor = unpackUnorm4x8(inID_ShaderData.y & 0x00ffffff); // this should be here normally
}
subroutine(shaderType) void shader_flatTexture(void)
{
outColor = vec4(0,0,1,1); // test blue color
// outColor = texture(texture0, inUV); // this should be here normally
}
void main()
{
uint shader = (inID_ShaderData.y >> 24) & 0xff; // extract subroutine index from attributes
shaders[ shader ](); // call subroutine - not working, makes artifacts
/* calling subroutine this way works ok
if (shader == 0) shaders[ 0 ]();
if (shader == 1) shaders[ 1 ]();
*/
outID = inID_ShaderData.x;
if (outID == -1) // this condition never happens
outColor = texture(texture0, inUV); // needed here to not to optimize out texture0, needed in soubroutine
}
Question 1:
When using shaders [ shader ] (); then there are pixel artifacts on quads drawn.
When using IFs, then it works OK. Is it driver bug, or am I doing something wrong?
How can this be achieved without IFs, using subroutines ?
(I have Radeon 7850 on Windows 8 64 bit)
Question2:
In second soubroutine I want to use texture. But if I don't use this sampler variable
in main(), then compiler "does not see it" in subroutine and on cpu-side glUniform
function fails.
Is there some way how to do right? Without compiler cheats, e.g. never happening conditions?
P.S.: Sorry I can not post image with artifacts, but what should be red squares are
red squares with random blue pixels on some 5% of area mostly in corners.
Blue squares have red pixels.
You simply cannot do that. Switching subroutines is not possible on a per-attribute basis. The GLSL spec has this to say about your attempt:
Subroutine variables may be declared as explicitly-sized arrays, which
can be indexed only with dynamically uniform expressions.
Thinking about how GPUs work, this restriction does make total sense. There simply is no separate control flow for every single shader invocation, but only for much larger groups.
When you use the if attempt, you are using constant indices, which of course are also dynamically uniform. You of course could try to brach based on your input attribute, but you should be aware that forcing non-uniform control flow that way might totally ruin performance. In the worst case, this will not significantly more efficient than running all executing all the subroutine functions all the time, storing the result in some array, and select the final result from that array using your index.

How can a fragment shader use the color values of the previously rendered frame?

I am learning to use shaders in OpenGL ES.
As an example: Here's my playground fragment shader which takes the current video frame and makes it grayscale:
varying highp vec2 textureCoordinate;
uniform sampler2D videoFrame;
void main() {
highp vec4 theColor = texture2D(videoFrame, textureCoordinate);
highp float avrg = (theColor[0] + theColor[1] + theColor[2]) / 3.0;
theColor[0] = avrg; // r
theColor[1] = avrg; // g
theColor[2] = avrg; // b
gl_FragColor = theColor;
}
theColor represents the current pixel. It would be cool to also get access to the previous pixel at this same coordinate.
For sake of curiousity, I would like to add or multiply the color of the current pixel to the color of the pixel in the previous render frame.
How could I keep the previous pixels around and pass them in to my fragment shader in order to do something with them?
Note: It's OpenGL ES 2.0 on the iPhone.
You need to render the previous frame to a texture, using a Framebuffer Object (FBO), then you can read this texture in your fragment shader.
The dot intrinsic function that Damon refers to is a code implementation of the mathematical dot product. I'm not supremely familiar with OpenGL so I'm not sure what the exact function call is, but mathematically a dot product goes like this :
Given a vector a and a vector b, the 'dot' product a 'dot' b produces a scalar result c:
c = a.x * b.x + a.y * b.y + a.z * b.z
Most modern graphics hardware (and CPUs, for that matter) are capable of performing this kind of operation in one pass. In your particular case, you could compute your average easily with a dot product like so:
highp vec4 = (1/3, 1/3, 1/3, 0) //or zero
I always get the 4th component in homogeneous vectors and matrices mixed up for some reason.
highp float avg = theColor DOT vec4
This will multiple each component of theColor by 1/3 (and the 4th component by 0), and then add them together.

Resources