I draw my scene using glDrawElements function.
Since I want to achieve situation, where one draw call draws complete scene,
I need to make shader which switches between "materials" in shader.
I decided to use soubroutine for materials. Here is my fragment shader.
#version 440
layout(location = 0) flat in uvec2 inID_ShaderData;
layout(location = 1) in vec4 inPosition;
layout(location = 2) in vec2 inUV;
subroutine void shaderType(void);
subroutine uniform shaderType shaders[2];
uniform sampler2D texture0;
layout(location = 0) out vec4 outColor;
layout(location = 1) flat out uint outID;
subroutine(shaderType) void shader_flatColor(void)
{
outColor = vec4(1,0,0,1); // test red color
// outColor = unpackUnorm4x8(inID_ShaderData.y & 0x00ffffff); // this should be here normally
}
subroutine(shaderType) void shader_flatTexture(void)
{
outColor = vec4(0,0,1,1); // test blue color
// outColor = texture(texture0, inUV); // this should be here normally
}
void main()
{
uint shader = (inID_ShaderData.y >> 24) & 0xff; // extract subroutine index from attributes
shaders[ shader ](); // call subroutine - not working, makes artifacts
/* calling subroutine this way works ok
if (shader == 0) shaders[ 0 ]();
if (shader == 1) shaders[ 1 ]();
*/
outID = inID_ShaderData.x;
if (outID == -1) // this condition never happens
outColor = texture(texture0, inUV); // needed here to not to optimize out texture0, needed in soubroutine
}
Question 1:
When using shaders [ shader ] (); then there are pixel artifacts on quads drawn.
When using IFs, then it works OK. Is it driver bug, or am I doing something wrong?
How can this be achieved without IFs, using subroutines ?
(I have Radeon 7850 on Windows 8 64 bit)
Question2:
In second soubroutine I want to use texture. But if I don't use this sampler variable
in main(), then compiler "does not see it" in subroutine and on cpu-side glUniform
function fails.
Is there some way how to do right? Without compiler cheats, e.g. never happening conditions?
P.S.: Sorry I can not post image with artifacts, but what should be red squares are
red squares with random blue pixels on some 5% of area mostly in corners.
Blue squares have red pixels.
You simply cannot do that. Switching subroutines is not possible on a per-attribute basis. The GLSL spec has this to say about your attempt:
Subroutine variables may be declared as explicitly-sized arrays, which
can be indexed only with dynamically uniform expressions.
Thinking about how GPUs work, this restriction does make total sense. There simply is no separate control flow for every single shader invocation, but only for much larger groups.
When you use the if attempt, you are using constant indices, which of course are also dynamically uniform. You of course could try to brach based on your input attribute, but you should be aware that forcing non-uniform control flow that way might totally ruin performance. In the worst case, this will not significantly more efficient than running all executing all the subroutine functions all the time, storing the result in some array, and select the final result from that array using your index.
Related
Some Fragment shaders in ShaderToy (e.g. fluid dynamics, https://www.shadertoy.com/view/4tGfDW ) use same buffer as both input and output. But when I try to do this in my C/C++ code it does not work (I renders strange checkerboard artifacts like inconsistent visual memory). To workaround this issue I have to use two different FrameBuffers A,B and flip textures ( first render A to B then render B back to A )
I understand that OpenGL does not allow to use the same texture both as input and output (?) due to memory consistency issues.
But isn't there more elegant solution than using two FrameBuffers ? E.g. using some lock, or temporary cache (I don't know some sychronization flag which takes care of this)???
EDIT - Details to answer the comment/question:
OpenGL (depending the GL version) has some very specific rules of what
can and can''t be done when the same texture is used as render target
and sampler input. If your use case can be implemented within this set
of requirements or not is not clear, as you have not explained what
exactly you need or want to do here.
basically I want to implement Fluid-Dynamics solver (e.g. that from ShaderToy linked above) as well as other partial differential equation solvers. That means each pixel output depends on some convolution mask (derivative, laplacian, average) of neighboring pixels. There may be also some movement (advection) which means reading values form distant pixels.
Currently I realized the artifacts appear mostly when I read/write pixels which are different place - i.e. it is non-local (e.g. pixel[100,100] depend on pixel[10,10])
Example of simple Fluid-Solver from Shadertoy:
vec4 solveFluid(sampler2D smp, vec2 uv, vec2 w, float time, vec3 mouse, vec3 lastMouse)
{
const float K = 0.2;
const float v = 0.55;
vec4 data = textureLod(smp, uv, 0.0);
vec4 tr = textureLod(smp, uv + vec2(w.x , 0), 0.0);
vec4 tl = textureLod(smp, uv - vec2(w.x , 0), 0.0);
vec4 tu = textureLod(smp, uv + vec2(0 , w.y), 0.0);
vec4 td = textureLod(smp, uv - vec2(0 , w.y), 0.0);
vec3 dx = (tr.xyz - tl.xyz)*0.5;
vec3 dy = (tu.xyz - td.xyz)*0.5;
vec2 densDif = vec2(dx.z ,dy.z);
data.z -= dt*dot(vec3(densDif, dx.x + dy.y) ,data.xyz); //density
vec2 laplacian = tu.xy + td.xy + tr.xy + tl.xy - 4.0*data.xy;
vec2 viscForce = vec2(v)*laplacian;
data.xyw = textureLod(smp, uv - dt*data.xy*w, 0.).xyw; //advection
vec2 newForce = vec2(0);
data.xy += dt*(viscForce.xy - K/dt*densDif + newForce); //update velocity
data.xy = max(vec2(0), abs(data.xy)-1e-4)*sign(data.xy); //linear velocity decay
#ifdef USE_VORTICITY_CONFINEMENT
data.w = (tr.y - tl.y - tu.x + td.x);
vec2 vort = vec2(abs(tu.w) - abs(td.w), abs(tl.w) - abs(tr.w));
vort *= VORTICITY_AMOUNT/length(vort + 1e-9)*data.w;
data.xy += vort;
#endif
data.y *= smoothstep(.5,.48,abs(uv.y-0.5)); //Boundaries
data = clamp(data, vec4(vec2(-10), 0.5 , -10.), vec4(vec2(10), 3.0 , 10.));
return data;
}
Currently I realized the artifacts appear mostly when I read/write pixels which are different place - i.e. it is non-local (e.g. pixel[100,100] depend on pixel[10,10])
Yes, this is never going to work on GPUs, as there are no particular guarantees on the order of individual fragment shader invocations whatsoever. So if the invocation writing to pixel [100,100] will see the results of the invocation writing to [10,10] or the original data will be totally random. As per the spec, you're getting undefined values when reading in such a cuncurrent read/write scenario, so theoretically, you could get even not one or the other, but see partial writes or totally different values (although that's not likely to occur on real world hardware).
And any order guarantees of such a scale simply does not make sense within the render pipeline, so there is also no partical means of synchronization you can manually add to solve this issue.
To workaround this issue I have to use two different FrameBuffers A,B and flip textures ( first render A to B then render B back to A )
Yes, the ping-pong approach is what you should do for this use case. And honestly, it should not incur any significant performance penalty in that scenario anyway, as you seem to write to each output pixel once anyway, so you don't need an additional copy of "untouched" pixels. So all it costs is the additional memory.
I've got a very detailed texture (with false color information I'm rendering with a false-color lookup in the fragment shader). My problem is that sometimes the user will zoom far away from this texture, and the fine detail will be lost: fine lines in the texture can't be seen. I would like to modify my code to make these lines pop out.
My thinking is that I can run fast filter over neighboring textels and pick out the biggest/smallest/most interesting value to render. What I'm not sure how to do is to find out if (and how much) to do this. When the user is zoomed into a triangle, I want the standard lookup. When they are zoomed out, a single pixel on the screen maps to many texture pixels.
How do I get an estimate of this? I am doing this with both orthogographic and perspective cameras.
My thinking is that I could somehow use the vertex shader to get an estimate of how big one screen pixel is in UV space and pass that as a varying to the fragment shader, but I still don't have a solid grasp on either the transforms and spaces enough to get the idea.
My current vertex shader is quite simple:
varying vec2 vUv;
varying vec3 vPosition;
varying vec3 vNormal;
varying vec3 vViewDirection;
void main() {
vUv = uv;
vec4 mvPosition = modelViewMatrix * vec4( position, 1.0 );
vPosition = (modelMatrix *
vec4(position,1.0)).xyz;
gl_Position = projectionMatrix * mvPosition;
vec3 transformedNormal = normalMatrix * vec3( normal );
vNormal = normalize( transformedNormal );
vViewDirection = normalize(mvPosition.xyz);
}
How do I get something like vDeltaUV, which gives the distance between screen pixels in UV units?
Constraints: I'm working in WebGL, inside three.js.
Here is an example of one image, where the user has zoomed perspective in close to my texture:
Here is the same example, but zoomed out; the feature above is a barely-perceptible diagonal line near the center (see the coordinates to get a sense of scale). I want this line to pop out by rendering all pixels with the red-est color of the corresponding array of textels.
Addendum (re LJ's comment)...
No, I don't think mipmapping will do what I want here, for two reasons.
First, I'm not actually mapping the texture; that is, I'm doing something like this:
gl_FragColor = texture2D(mappingtexture, texture2d(vec2(inputtexture.g,inputtexture.r))
The user dynamically creates the mappingtexture, which allows me to vary the false-color map in realtime. I think it's actually a very elegant solution to my application.
Second, I don't want to draw the AVERAGE value of neighboring pixels (i.e. smoothing) I want the most EXTREME value of neighboring pixels (i.e. something more akin to edge finding). "Extreme" in this case is technically defined by my encoding of the g/r color values in the input texture.
Solution:
Thanks to the answer below, I've now got a working solution.
In my javascript code, I had to add:
extensions: {derivatives: true}
to my declaration of the ShaderMaterial. Then in my fragment shader:
float dUdx = dFdx(vUv.x); // Difference in U between this pixel and the one to the right.
float dUdy = dFdy(vUv.x); // Difference in U between this pixel and the one to the above.
float dU = sqrt(dUdx*dUdx + dUdy*dUdy);
float pixel_ratio = (dU*(uInputTextureResolution));
This allows me to do things like this:
float x = ... the u coordinate in pixels in the input texture
float y = ... the v coordinate in pixels in the input texture
vec4 inc = get_encoded_adc_value(x,y);
// Extremum mapping:
if(pixel_ratio>2.0) {
inc = most_extreme_value(inc, get_encoded_adc_value(x+1.0, y));
}
if(pixel_ratio>3.0) {
inc = most_extreme_value(inc, get_encoded_adc_value(x-1.0, y));
}
The effect is subtle, but definitely there! The lines pop much more clearly.
Thanks for the help!
You can't do this in the vertex shader as it's pre-rasterization stage hence output resolution agnostic, but in the fragment shader you could use dFdx, dFdy and fwidth using the GL_OES_standard_derivatives extension(which is available pretty much everywhere) to estimate the sampling footprint.
If you're not updating the texture in realtime a simpler and more efficient solution would be to generate custom mip levels for it on the CPU.
I have some data encoded in a floating point texture 2k by 2k. The data are longitude, latitude, time, and date as R,G,B,A. Those are all normalized but for now that is not a problem. I can de-normalize them later if i want to.
What i need now is to iterate through the whole texture and find what longitude, latitude should be in that fragment coordinate. I assume that the whole atlas has normalized coordinates and it maps the whole openGL context. Besides coordinates i will filter data with time and date but that is an if condition that is easy to be done. Because pixel coordinates that i have will not map exactly that coordinate i will use a small delta value to fix that issue for now and i will sue that delta value to precompute other points that are close to that co.
Now i have some driver crashes on iGPU (it should be out of memory or something similar) even if i want to add something in 2 for nested loops or even if I use a discard.
The code i now is this
NOTE f_time is the filter for the time and for now i have a slider so that i will have some interaction with the values.
precision mediump float;
precision mediump int;
const int maxTextureSize = 2048;
varying vec2 v_texCoord;
uniform sampler2D u_texture;
uniform float f_time;
uniform ivec2 textureDimensions;
void main(void) {
float delta = 0.001;// now bigger delta just to make it work then we tune it
// compute 1 pixel in texture coordinates.
vec2 onePixel = vec2(1.0, 1.0) / float(textureDimensions.x);
vec2 position = ( gl_FragCoord.xy / float(textureDimensions.x) );
vec4 color = texture2D(u_texture, v_texCoord);
vec4 outColor = vec4(0.0);
float dist_x = distance( color.r, gl_FragCoord.x);
float dist_y = distance( color.g, gl_FragCoord.y);
//float dist_x = distance( color.g, gl_PointCoord.s);
//float dist_y = distance( color.b, gl_PointCoord.t);
for(int i = 0; i < maxTextureSize; i++){
if(i < textureDimensions.x ){
break;
}
for(int j = 0; j < maxTextureSize ; j++){
if(j < textureDimensions.y ){
break;
}
// Where i am stuck now how to get the texture coordinate and test it with fragment shader
// the precomputation
vec4 pixel= texture2D(u_texture,vec2(i,j));
if(pixel.r > f_time){
outColor = vec4(1.0, 1.0, 1.0, 1.0);
// for now just break, no delta calculation to sum this point with others so that
// we will have an approximation of other points into that pixel
break;
}
}
}
// this works
if(color.t > f_time){
//gl_FragColor = color;//;vec4(1.0, 1.0, 1.0, 1.0);
}
gl_FragColor = outColor;
}
What you are trying to do is simply not feasible.
You are trying to access a texture up to four million times, all within a single fragment shader invocation.
The way modern GPUs usually detect infinite loop conditions is by seeing how long your shader runs, and then killing it if it has run for "too long", the length of which is usually sufficiently generous. Your code, which does up to 4 million texture accesses, will almost certainly trigger this condition.
Which typically leads to a GPU reset.
Generally speaking, the way you would find the position in a texture which is associated with some fragment is to do so directly. That is, create a 1:1 correspondence between screen fragment locations (gl_FragCoord) and texels in the texture. That way, your texture does not need to contain X/Y coordinates, and each fragment shader can access the data meant for that specific invocation.
What you're trying to do seems to be to pass a large table (four million elements) to the GPU, and then have the GPU process it. The ordering of values is (generally) irrelevant; any value could potentially modify any pixel. Some pixels don't have values applied to them, while others may have multiple values applied.
This is serial programmer thinking, not parallel thinking. The way you'd code that on the CPU is to walk each element in the table, look at where it goes, and build the results for each pixel.
In a parallel algorithm, you don't work that way. Each invocation needs to be able to instantly find the data in the table that applies to it. You should never be doing some kind of search through a table for your data. Especially not a linear search.
You need to think of this from the perspective of your fragment shader.
In your data table, for each position on the screen, there is a list of data values that apply to that screen position. Correct? What you need to do is make that list directly available to each fragment shader invocation. And since each fragment's list is not constant in size, you will need to use a linked list rather than a fixed-size array.
To do this, you build a texture the size of your render target. Each texel in the texture specifies the location in the data table of the first element that this fragment needs to process. This provides every fragment shader invocation with the location of its first element. Since some fragment shaders may have no data applied to them, you need to set aside some special texture coordinate value to represent "none".
The data in the data table consists of your time and date, but rather than "longitude/latitude", it has the texture coordinate of the next texel in the texture that applies for that fragment shader. This is how you make a linked list in shaders. Each location in the data table specifies the next location to be processed.
If that location was the last data to be processed, then the location will be the "none" value from before.
You should also be using a buffer texture or an SSBO to hold your data table, rather than a 2D texture. It would make things much easier.
My algorithm is simple for clustering, and it goes like this.
First object is grouped by all other objects which the distance between them is lower the X.
Then we go to the second object, if not included in the first group, we run the same algorithm on the other objects that are not included in the first group,
and so on...
I'm trying to do this algo in the GPU using the fragment shader.
First I set all the locations into a RGBA float texture. Setting for each pixel the location (x,y) - z and w are free for now. Then i draw to a result texture my calculations using the shader. In the end i will read the pixels of the result texture and do my code.
Tried many variations of code, and multi phases draw for performing my algorithm but i'm not happy with the time performances.
The question is,
Is there a way to do one run over the texture to perform my wish (single draw phase) ?
My latest try is this algorithm - My fragment shader
precision highp float;
uniform sampler2D locs;
varying vec2 coord;
uniform float clusterDistance;
const float textureSize = 64.;
void main()
{
// Getting my location
vec4 currData = texture2D(locs, coord);
float offsetPix = 1./textureSize/2.;
vec2 coordIdx = (coord - offsetPix) * textureSize;
// Getting the index of my location
float myIdx = coordIdx.y * textureSize + coordIdx.x;
int clusterIdx = 0;
float clusterNum = 0.;
// Running over all the other locations until me and finding the first close object to me
for (float i=0.;i<textureSize*textureSize;++i)
{
clusterNum = i +1.;
// Which mean that we didn't find any closed object to me so we stop
if (i == myIdx)
{
break;
}
else
{
vec2 pntLoc = vec2(mod(i, textureSize), floor(i/textureSize)) / textureSize+offsetPix;
vec4 pnt = texture2D(locs, pntLoc);
if (distance(currData.xy, pnt.xy) <= clusterDistance)
{
break;
}
}
}
// Print the result
gl_FragColor = vec4(currData.x, currData.y, clusterNum, 1.);
}
But the problem here is that the result can cause a chain clustering. For ex.
if our data is {0,0}, {4,0}, {8,0}, and the max distance to group is 4. Then the first is closed to the second. and then the third is close to the second but not the first. according to my algo, it is returning the index of the second, although that second is out of the picture because is grouped by the first object, and the first is the reference object for distances.
Is it possible to read from the result texture while writing to it?
It would solve my problem, cause then i could check the z value of the result when comparing distances..
No, you cannot read and write to a texture in the same pass (with standard WebGL and I think not at all in the way you intend).
Your algorithm seems rather serial in nature, not well suited for GPU/SIMD execution, but I may misinterpret your intent. Remember that the GPU may run a shader program for multiple data-points (fragments/pixels in this case) at once, having no clue about the results of others.
You also can't break out of a for loop on a SIMD architecture. The for loop will just keep iterating although the changes will not be written for fragments that broke out of it. In other words there is no speed benefit. It's a different story if the break condition evaluates to the same value for all fragments.
You might want to look at other ways of clustering, like k-means.
I am in process of implementing this Windows aero theme on my Arm Mali GPU
Follow this image, i want a effect like this
checkout this http://vistastyles.org/wp-content/uploads/2007/06/windows_aero_by_blingboy31.jpg
my question is how do identify which area of image(top) is intermixed with image(bottom)
it may possible that there may be multiple images getting overlapped, So do i need to take weight of each image or layer to identify the amount of blur to show.. ??
& how do i draw the images ? do i need to draw the blur image of background first & then normal image or other way??
Any help is appericiated...
SEASON 2
/*******************************************/
Well I tried the datenWolf's way, But stucked at blur shader
I used this fragment shader for blurring but it doesn't give the exact Translucent effect, I wish to get
#version 120
uniform sampler2D sceneTex;
uniform float rt_w;
uniform float rt_h;
uniform float vx_offset;
float offset[3] = float[]( 0.0, 1.3846153846, 3.2307692308 );
float weight[3] = float[]( 0.2270270270, 0.3162162162, 0.0702702703 );
void main()
{
vec3 tc = vec3(1.0, 0.0, 0.0);
if (gl_TexCoord[0].x<(vx_offset-0.01))
{
vec2 uv = gl_TexCoord[0].xy;
tc = texture2D(sceneTex, uv).rgb * weight[0];
for (int i=1; i<3; i++)
{
tc += texture2D(sceneTex, uv + vec2(offset[i])/rt_w, 0.0).rgb * weight[i];
tc += texture2D(sceneTex, uv - vec2(offset[i])/rt_w, 0.0).rgb * weight[i];
}
}
else if (gl_TexCoord[0].x>=(vx_offset+0.01))
{
tc = texture2D(sceneTex, gl_TexCoord[0].xy).rgb;
}
gl_FragColor = vec4(tc, 1.0);
}
I was looking for a Blur effect as shown in this link
http://incubator.quasimondo.com/processing/superfast_blur.php
but this software algorithm is not working in my code, Any help for a Blur Shader With this kind of effect will be helpful, I am using OpenGLES 2.0 on Arm mali
There's no detection process involved in window composition. Every window is available as a separate texture and the windows are drawn in depth order using a blurring step inbetween.
It is done about this. You have two framebuffer objects A and B for the whole screen. Foreach window you
draw the area the window covers from B to A with a blurring filter applied
you draw the actual window contents on top of this to B
swap A←→B
repeat for every window in depth order. Basically this comes out as a number of texture switches, and although those are sort of expensive, the number of (overlapping) windows on a screen will always be within reasonable figures, so this is not really a problem.