GLES Encode/Decode 32bits float to 2x16bits - opengl-es

Im trying to optimize texture memory and all that stop me from converting a GL_RGBA32F LUT to GL_RGBA16F is one index that (might) exceed the limit. Is there anyway that I could in C take a float and split it into 2 values and then in GLSL reconstruct that float from the 2 values stored in the LUT?
What I mean is something like this:
[ C ]
float v0,v1, *pixel_array;
magic_function_in_c( my_big_value, &v0, &v1 );
pixel_array[ index++ ] = pos.x; // R
pixel_array[ index++ ] = pos.y; // G
pixel_array[ index++ ] = v0; // B
pixel_array[ index++ ] = v1; // A
[ GLSL ]
vec4 lookup = texture2D( sampler0, texcoord );
float v = magic_function_in_glsl( lookup.b, lookup.a );
ps: Im using GLES 2.0 (to be also compatible with WebGL)

If you just need more range than float16 provides, and only in one direction (larger or smaller), you can multiply by a fixed scaling factor.
For instance, if you need to some number N, greater than 65503, you can 'encode' by dividing N by 2, and 'decode' by multiplying by 2. This shifts the effective range up, sacrificing the range of 1/N, but expanding the range maximum for +/-N. You can swap the multiply and divide if you need more range in 1/N than in +/-N. You can use the second value to store what the scaling factor is, if you need it to change based on data.
You can also experiment with exp2 and log2, something like:
void
magic_function_in_c(float fVal, uint16_t* hExponent, uint16_t* hMult)
{
float fExponent = log2f(f);
*hExponent = f32_to_f16(fExponent);
// Compensate for f32->f16 precision loss
float fActualExponent = f16_to_f32(*hExponent);
float fValFromExponent = exp2f(fActualExponent);
float fMult;
if (fValFromExponent != 0.0f) {
fMult = fVal / fValFromExponent;
} else if (fVal < 0.0f) {
fMult = -1.0f;
} else {
fMult = 1.0f
}
*hMult = f32_to_f16(fMult);
}
highp float
magic_function_in_glsl(highp float hExponent, highp float hMult)
{
return exp2(hExponent) * hMult;
}
Note that none of this will work if you don't have highp floats in your GLSL shader.

Related

GLSL uv lookup and precision with FBO / RenderTarget in Three.js

My application is coded in Javascript + Three.js / WebGL + GLSL. I have 200 curves, each one made of 85 points. To animate the curves I add a new point and remove the last.
So I made a positions shader that stores the new positions onto a texture (1) and the lines shader that writes the positions for all curves on another texture (2).
The goal is to use textures as arrays: I know the first and last index of a line, so I need to convert those indices to uv coordinates.
I use FBOHelper to debug FBOs.
1) This 1D texture contains the new points for each curve (200 in total): positionTexture
2) And these are the 200 curves, with all their points, one after the other: linesTexture
The black parts are the BUG here. Those texels shouldn't be black.
How does it work: at each frame the shader looks up the new point for each line in the positionTexture and updates the linesTextures accordingly, with a for loop like this:
#define LINES_COUNT = 200
#define LINE_POINTS = 85 // with 100 it works!!!
// Then in main()
vec2 uv = gl_FragCoord.xy / resolution.xy;
for (float i = 0.0; i < LINES_COUNT; i += 1.0) {
float startIdx = i * LINE_POINTS; // line start index
float endIdx = beginIdx + LINE_POINTS - 1.0; // line end index
vec2 lastCell = getUVfromIndex(endIdx); // last uv coordinate reserved for current line
if (match(lastCell, uv)) {
pos = texture2D( positionTexture, vec2((i / LINES_COUNT) + minFloat, 0.0)).xyz;
} else if (index >= startIdx && index < endIdx) {
pos = texture2D( lineTexture, getNextUV(uv) ).xyz;
}
}
This works, but it's slightly buggy when I have many lines (150+): likely a precision problem. I'm not sure if the functions I wrote to look up the textures are right. I wrote functions like getNextUV(uv) to get the value from the next index (converted to uv coordinates) and copy to the previous. Or match(xy, uv) to know if the current fragment is the texel I want.
I though I could simply use the classic formula:
index = uv.y * width + uv.x
But it's more complicated than that. For example match():
// Wether a point XY is within a UV coordinate
float size = 132.0; // width and height of texture
float unit = 1.0 / size;
float minFloat = unit / size;
bool match(vec2 point, vec2 uv) {
vec2 p = point;
float x = floor(p.x / unit) * unit;
float y = floor(p.y / unit) * unit;
return x <= uv.x && x + unit > uv.x && y <= uv.y && y + unit > uv.y;
}
Or getUVfromIndex():
vec2 getUVfromIndex(float index) {
float row = floor(index / size); // Example: 83.56 / 10 = 8
float col = index - (row * size); // Example: 83.56 - (8 * 10) = 3.56
col = col / size + minFloat; // u = 0.357
row = row / size + minFloat; // v = 0.81
return vec2(col, row);
}
Can someone explain what's the most efficient way to lookup values in a texture, by getting a uv coordinate from index value?
Texture coordinates go from the edge of pixels not the centers so your formula to compute a UV coordinates needs to be
u = (xPixelCoord + .5) / widthOfTextureInPixels;
v = (yPixelCoord + .5) / heightOfTextureInPixels;
So I'm guessing you want getUVfromIndex to be
uniform vec2 sizeOfTexture; // allow texture to be any size
vec2 getUVfromIndex(float index) {
float widthOfTexture = sizeOfTexture.x;
float col = mod(index, widthOfTexture);
float row = floor(index / widthOfTexture);
return (vec2(col, row) + .5) / sizeOfTexture;
}
Or, based on some other experience with math issues in shaders you might need to fudge index
uniform vec2 sizeOfTexture; // allow texture to be any size
vec2 getUVfromIndex(float index) {
float fudgedIndex = index + 0.1;
float widthOfTexture = sizeOfTexture.x;
float col = mod(fudgedIndex, widthOfTexture);
float row = floor(fudgedIndex / widthOfTexture);
return (vec2(col, row) + .5) / sizeOfTexture;
}
If you're in WebGL2 you can use texelFetch which takes integer pixel coordinates to get a value from a texture

GLSL for loop for grid neighbor calculation bug

For a little background this is for doing particle collisions with lookup textures on the GPU. I read the position texture with javascript and create a grid texture that contains the particles that are in the corresponding grid cell. The working example that is mentioned in the post can be viewed here: https://pacific-hamlet-84784.herokuapp.com/
The reason I want the buckets system is that it will allow me to do much fewer checks and the number of checks wouldn't increase with the number of particles.
For the actual problem description:
I am attempting to read from a lookup texture centered around a pixel (lets say i have a texture that is 10x10, and I want to read the pixels around (4,2), i would read
(3,1),(3,2)(3,3)
(4,1),(4,2)(4,3)
(5,1),(5,2)(5,3)
The loop is a little more complicated but that is the general idea. If I make the loop look like the following
float xcenter = 5.0;
float ycenter = 5.0;
for(float i = -5.0; i < 5.0; i++){
for(float j = -5.0; j < 5.0; j++){
}
}
It works (however it goes over all of the particles which defeats the purpose), however if I calculate the value dynamically (which is what I need), then I get really bizarre behavior. Is this a problem with GLSL or a problem with my code? I output the values to an image and read the pixel values and they all appear to be within the right range. The problem is coming from using the for loop variables (i,j) to change a bucket index that is calculated outside of the loop, and use that variable to index a texture.
The entire shader code can be seen here:
(if I remove the hard coded 70, and remove the comments it breaks, but all of those values are between 0 and 144. This is where I am confused. I feel like this code should still work fine.).
uniform sampler2D pos;
uniform sampler2D buckets;
uniform vec2 res;
uniform vec2 screenSize;
uniform float size;
uniform float bounce;
const float width = &WIDTH;
const float height = &HEIGHT;
const float cellSize = &CELLSIZE;
const float particlesPerCell = &PPC;
const float bucketsWidth = &BW;
const float bucketsHeight = &BH;
$rand
void main(){
vec2 uv = gl_FragCoord.xy / res;
vec4 posi = texture2D( pos , uv );
float x = posi.x;
float y = posi.y;
float z = posi.z;
float target = 1.0 * size;
float x_bkt = floor( (x + (screenSize.x/2.0) )/cellSize);
float y_bkt = floor( (y + (screenSize.y/2.0) )/cellSize);
float x_bkt_ind_start = 70.0; //x_bkt * particlesPerCell;
float y_bkt_ind_start =70.0; //y_bkt * particlesPerCell;
//this is the code that is acting weirdly
for(float j = -144.0 ; j < 144.0; j++){
for(float i = -144.0 ; i < 144.0; i++){
float x_bkt_ind = (x_bkt_ind_start + i)/bucketsWidth;
float y_bkt_ind = (y_bkt_ind_start + j)/bucketsHeight;
vec4 ind2 = texture2D( buckets , vec2(x_bkt_ind,y_bkt_ind) );
if( abs(ind2.z - 1.0) > 0.00001 || x_bkt_ind < 0.0 || x_bkt_ind > 1.0 || y_bkt_ind < 0.0 || y_bkt_ind > 1.0 ){
continue;
}
vec4 pos2 = texture2D( pos , vec2(ind2.xy)/res );
vec2 diff = posi.xy - pos2.xy;
float dist = length(diff);
vec2 uvDiff = ind2.xy - gl_FragCoord.xy ;
float uvDist = abs(length(uvDiff));
if(dist <= target && uvDist >= 0.5){
float factor = (dist-target)/dist;
x = x - diff.x * factor * 0.5;
y = y - diff.y * factor * 0.5;
}
}
}
gl_FragColor = vec4( x, y, x_bkt_ind_start , y_bkt_ind_start);
}
EDIT:
To make my problem clear, what is happening is that when I do the first texture lookup, I get the position of the particle:
vec2 uv = gl_FragCoord.xy / res;
vec4 posi = texture2D( pos , uv );
After, I calculate the bucket that the particle is in:
float x_bkt = floor( (x + (screenSize.x/2.0) )/cellSize);
float y_bkt = floor( (y + (screenSize.y/2.0) )/cellSize);
float x_bkt_ind_start = x_bkt * particlesPerCell;
float y_bkt_ind_start = y_bkt * particlesPerCell;
All of this is correct. Like I am getting the correct values and if I set these as the output values of the shader and read the pixels they are the correct values. I also changed my implementation a little and this code works fine.
In order to text the for loop, I replaced the pixel lookup coordinates in the grid bucket by the pixel positions. I adapted the code and it works fine, however I have to recalculate the buckets multiple times per frame so the code is not very efficient. If instead of storing the pixel positions I store the uv coordinates of the pixels and then do a lookup using those uv positions:
//get the texture coordinate that is offset by the for loop
float x_bkt_ind = (x_bkt_ind_start + i)/bucketsWidth;
float y_bkt_ind = (y_bkt_ind_start + j)/bucketsHeight;
//use the texture coordinates to get the stored texture coordinate in the actual position table from the bucket table
vec4 ind2 = texture2D( buckets , vec2(x_bkt_ind,y_bkt_ind) );
and then I actually get the position
vec4 pos2 = texture2D( pos , vec2(ind2.xy)/res );
this pos2 value will be wrong. I am pretty sure that the ind2 value is correct because if instead of storing a pixel coordinate in that bucket table I store position values and remove the second texture lookup, the code runs fine. But using the second lookup causes the code to break.
In the original post if I set the bucket to be any value, lets say the middle of the texture, and iterate over every possible bucket coordinate around the pixel, it works fine. However if I calculate the bucket position and iterate over every pixel it does not. I wonder if it has to do with the say glsl compiles the shaders and that some sort of optimization it is making is causing the double texture lookups to break in the for look. Or it is just a mistake in my code. I was able to get the single texture lookup in a for loop working when I just stored position values in the bucket texture.

What's the most efficient way in WebGL to find the min and max values of an RGBA float texture?

I'm storing floating-point gpgpu values in a webgl RGBA render texture, using only the r channel to store my data (I know I should be using a more efficient texture format but that's a separate concern).
Is there any efficient way / trick / hack to find the global min and max floating-point values without resorting to gl.readPixels? Note that just exporting the floating-point data is a hassle in webgl since readPixels doesn't yet support reading gl.FLOAT values.
This is the gist of how I'm currently doing things:
if (!gl) {
gl = renderer.getContext();
fb = gl.createFramebuffer();
pixels = new Uint8Array(SIZE * SIZE * 4);
}
if (!!gl) {
// TODO: there has to be a more efficient way of doing this than via readPixels...
gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, data.rtTemp2.__webglTexture, 0);
if (gl.checkFramebufferStatus(gl.FRAMEBUFFER) == gl.FRAMEBUFFER_COMPLETE) {
// HACK: we're pickling a single float value in every 4 bytes
// because webgl currently doesn't support reading gl.FLOAT
// textures.
gl.readPixels(0, 0, SIZE, SIZE, gl.RGBA, gl.UNSIGNED_BYTE, pixels);
var max = -100, min = 100;
for (var i = 0; i < SIZE; ++i) {
for (var j = 0; j < SIZE; ++j) {
var o = 4 * (i * SIZE + j);
var x = pixels[o + 0];
var y = pixels[o + 1] / 255.0;
var z = pixels[o + 2] / 255.0;
var v = (x <= 1 ? -1.0 : 1.0) * y;
if (z > 0.0) { v /= z; }
max = Math.max(max, v);
min = Math.min(min, v);
}
}
// ...
}
}
(using a fragment shader that ouputs floating-point data in the following format suitable for UNSIGNED_BYTE parsing...
<script id="fragmentShaderCompX" type="x-shader/x-fragment">
uniform sampler2D source1;
uniform sampler2D source2;
uniform vec2 resolution;
void main() {
vec2 uv = gl_FragCoord.xy / resolution.xy;
float v = texture2D(source1, uv).r + texture2D(source2, uv).r;
vec4 oo = vec4(1.0, abs(v), 1.0, 1.0);
if (v < 0.0) {
oo.x = 0.0;
}
v = abs(v);
if (v > 1.0) {
oo.y = 1.0;
oo.z = 1.0 / v;
}
gl_FragColor = oo;
}
</script>
Without compute shaders, the only thing that comes to mind is using a fragment shader to do that. For a 100x100 texture you could try rendering to a 20x20 grid texture, have the fragment shader do 5x5 lookups (with GL_NEAREST) to determine min and max, then download the 20x20 texture and do the rest on the CPU. Or do another pass to reduce it again. I don't know for which grid sizes it's more efficient though, you'll have to experiment. Maybe this helps, or googling "reduction gpu".
Render 1 vertex on 1x1 framebuffer and within shader sample whole previously rendered texture. That way you are testing texture on GPU which should be fast enough for real-time (or not?), however it is definitely faster than doing it on CPU, and the output would be min/max value.
I also ran across solution to try mipmap-ing texture and going through different levels.
These links might be helpful:
http://www.gamedev.net/topic/559942-glsl--find-global-min-and-max-in-texture/
http://www.opengl.org/discussion_boards/showthread.php/175692-most-efficient-way-to-get-maximum-value-in-texture
Hope this helps.

How do I convert a vec4 rgba value to a float?

I packed some float data in a texture as an unsigned_byte, my only option in webgl. Now I would like unpack it in the vertex shader. When I sample a pixel I get a vec4 which is really one of my floats. How do I convert from the vec4 to a float?
The following code is specifically for the iPhone 4 GPU using OpenGL ES 2.0. I have no experience with WebGL so I cant claim to know how the code will work in that context. Furthermore the main problem here is that highp float is not 32 bits but is instead 24 bit.
My solution is for fragment shaders - I didnt try it in the vertex shader but it shouldnt be any different. In order to use the you will need to get the RGBA texel from a sampler2d uniform and make sure that the values of each R,G,B and A channels are between 0.0 and 255.0 . This is easy to achieve as follows:
highp vec4 rgba = texture2D(textureSamplerUniform, texcoordVarying)*255.0;
You should be aware though that the endianess of your machine will dictate the correct order of your bytes. The above code assumes that floats are stored in big-endian order. If you see your results are wrong then just swap the order of the data by writing
rgba.rgba=rgba.abgr;
immediately after the line where you set it. Alternatively swap the indices on rgba. I think the above line is more intutive though and less prone to careless errors.
I am not sure if it works for all given input. I tested for a large range of numbers and found that decode32 and encode32 are NOT exact inverses. Ive also left out the code I used to test it.
#pragma STDGL invariant(all)
highp vec4 encode32(highp float f) {
highp float e =5.0;
highp float F = abs(f);
highp float Sign = step(0.0,-f);
highp float Exponent = floor(log2(F));
highp float Mantissa = (exp2(- Exponent) * F);
Exponent = floor(log2(F) + 127.0) + floor(log2(Mantissa));
highp vec4 rgba;
rgba[0] = 128.0 * Sign + floor(Exponent*exp2(-1.0));
rgba[1] = 128.0 * mod(Exponent,2.0) + mod(floor(Mantissa*128.0),128.0);
rgba[2] = floor(mod(floor(Mantissa*exp2(23.0 -8.0)),exp2(8.0)));
rgba[3] = floor(exp2(23.0)*mod(Mantissa,exp2(-15.0)));
return rgba;
}
highp float decode32(highp vec4 rgba) {
highp float Sign = 1.0 - step(128.0,rgba[0])*2.0;
highp float Exponent = 2.0 * mod(rgba[0],128.0) + step(128.0,rgba[1]) - 127.0;
highp float Mantissa = mod(rgba[1],128.0)*65536.0 + rgba[2]*256.0 +rgba[3] + float(0x800000);
highp float Result = Sign * exp2(Exponent) * (Mantissa * exp2(-23.0 ));
return Result;
}
void main()
{
highp float result;
highp vec4 rgba=encode32(-10.01);
result = decode32(rgba);
}
Here are some links on IEEE precision I found useful. Link1. Link2. Link3.
Twerdster posted some excellent code in his answer. So all credit go to him. I post this new answer, since comments don't allow for nice syntax colored code blocks, and i wanted to share some code. But if you like the code, please upvote Twerdster original answer.
In Twerdster previous post he mentioned that the decode and encode might not work for all values.
To further test this, and validate the result i made a java program. While porting the code i tried to stayed as close as possible to the shader code (therefore i implemented some helper functions).
Note: I also use a store/load function to similate what happens when you write/read from a texture.
I found out that:
You need a special case for the zero
You might also need special case for infinity, but i did not implement that to keep the shader simple (eg: faster)
Because of rounding errors sometimes the result was wrong therefore:
subtract 1 from exponent when because of rounding the mantissa is not properly normalised (eg mantissa < 1)
Change float Mantissa = (exp2(- Exponent) * F); to float Mantissa = F/exp2(Exponent); to reduce precision errors
Use float Exponent = floor(log2(F)); to calc exponent. (simplified by new mantissa check)
Using these small modifications i got equal output on almost all inputs, and got only small errors between the original and encoded/decoded value when things do go wrong, while in Twerdster's original implementation rounding errors often resulted in the wrong exponent (thus the result being off by factor two).
Please note that this is a Java test application which i wrote to test the algorithm. I hope this will also work when ported to the GPU. If anybody tries to run it on a GPU, please leave a comment with your experience.
And for the code with a simple test to try different numbers until it failes.
import java.io.PrintStream;
import java.util.Random;
public class BitPacking {
public static float decode32(float[] v)
{
float[] rgba = mult(255, v);
float sign = 1.0f - step(128.0f,rgba[0])*2.0f;
float exponent = 2.0f * mod(rgba[0],128.0f) + step(128.0f,rgba[1]) - 127.0f;
if(exponent==-127)
return 0;
float mantissa = mod(rgba[1],128.0f)*65536.0f + rgba[2]*256.0f +rgba[3] + ((float)0x800000);
return sign * exp2(exponent-23.0f) * mantissa ;
}
public static float[] encode32(float f) {
float F = abs(f);
if(F==0){
return new float[]{0,0,0,0};
}
float Sign = step(0.0f,-f);
float Exponent = floor(log2(F));
float Mantissa = F/exp2(Exponent);
if(Mantissa < 1)
Exponent -= 1;
Exponent += 127;
float[] rgba = new float[4];
rgba[0] = 128.0f * Sign + floor(Exponent*exp2(-1.0f));
rgba[1] = 128.0f * mod(Exponent,2.0f) + mod(floor(Mantissa*128.0f),128.0f);
rgba[2] = floor(mod(floor(Mantissa*exp2(23.0f -8.0f)),exp2(8.0f)));
rgba[3] = floor(exp2(23.0f)*mod(Mantissa,exp2(-15.0f)));
return mult(1/255.0f, rgba);
}
//shader build-in's
public static float exp2(float x){
return (float) Math.pow(2, x);
}
public static float[] step(float edge, float[] x){
float[] result = new float[x.length];
for(int i=0; i<x.length; i++)
result[i] = x[i] < edge ? 0.0f : 1.0f;
return result;
}
public static float step(float edge, float x){
return x < edge ? 0.0f : 1.0f;
}
public static float mod(float x, float y){
return x-y * floor(x/y);
}
public static float floor(float x){
return (float) Math.floor(x);
}
public static float pow(float x, float y){
return (float)Math.pow(x, y);
}
public static float log2(float x)
{
return (float) (Math.log(x)/Math.log(2));
}
public static float log10(float x)
{
return (float) (Math.log(x)/Math.log(10));
}
public static float abs(float x)
{
return (float)Math.abs(x);
}
public static float log(float x)
{
return (float)Math.log(x);
}
public static float exponent(float x)
{
return floor((float)(Math.log(x)/Math.log(10)));
}
public static float mantissa(float x)
{
return floor((float)(Math.log(x)/Math.log(10)));
}
//shorter matrix multiplication
private static float[] mult(float scalar, float[] w){
float[] result = new float[4];
for(int i=0; i<4; i++)
result[i] = scalar * w[i];
return result;
}
//simulate storage and retrieval in 4-channel/8-bit texture
private static float[] load(int[] v)
{
return new float[]{v[0]/255f, v[1]/255f, v[2]/255f, v[3]/255f};
}
private static int[] store(float[] v)
{
return new int[]{((int) (v[0]*255))& 0xff, ((int) (v[1]*255))& 0xff, ((int) (v[2]*255))& 0xff, ((int) (v[3]*255))& 0xff};
}
//testing until failure, and some specific hard-cases separately
public static void main(String[] args) {
//for(float v : new float[]{-2097151.0f}){ //small error here
for(float v : new float[]{3.4028233e+37f, 8191.9844f, 1.0f, 0.0f, 0.5f, 1.0f/3, 0.1234567890f, 2.1234567890f, -0.1234567890f, 1234.567f}){
float output = decode32(load(store(encode32(v))));
PrintStream stream = (v==output) ? System.out : System.err;
stream.println(v + " ?= " + output);
}
//System.exit(0);
Random r = new Random();
float max = 3200000f;
float min = -max;
boolean error = false;
int trials = 0;
while(!error){
float fin = min + r.nextFloat() * ((max - min) + 1);
float fout = decode32(load(store(encode32(fin))));
if(trials % 10000 == 0)
System.out.print('.');
if(trials % 1000000 == 0)
System.out.println();
if(fin != fout){
System.out.println();
System.out.println("correct trials = " + trials);
System.out.println(fin + " vs " + fout);
error = true;
}
trials++;
}
}
}
I tried Arjans solution, but it returned invalid values for 0, 1, 2, 4. There was a bug with the packing of the exponent, which i changed so the exp takes one 8bit float and the sign is packed with the mantissa:
//unpack a 32bit float from 4 8bit, [0;1] clamped floats
float unpackFloat4( vec4 _packed)
{
vec4 rgba = 255.0 * _packed;
float sign = step(-128.0, -rgba[1]) * 2.0 - 1.0;
float exponent = rgba[0] - 127.0;
if (abs(exponent + 127.0) < 0.001)
return 0.0;
float mantissa = mod(rgba[1], 128.0) * 65536.0 + rgba[2] * 256.0 + rgba[3] + (0x800000);
return sign * exp2(exponent-23.0) * mantissa ;
}
//pack a 32bit float into 4 8bit, [0;1] clamped floats
vec4 packFloat(float f)
{
float F = abs(f);
if(F == 0.0)
{
return vec4(0,0,0,0);
}
float Sign = step(0.0, -f);
float Exponent = floor( log2(F));
float Mantissa = F/ exp2(Exponent);
//std::cout << " sign: " << Sign << ", exponent: " << Exponent << ", mantissa: " << Mantissa << std::endl;
//denormalized values if all exponent bits are zero
if(Mantissa < 1.0)
Exponent -= 1;
Exponent += 127;
vec4 rgba;
rgba[0] = Exponent;
rgba[1] = 128.0 * Sign + mod(floor(Mantissa * float(128.0)),128.0);
rgba[2] = floor( mod(floor(Mantissa* exp2(float(23.0 - 8.0))), exp2(8.0)));
rgba[3] = floor( exp2(23.0)* mod(Mantissa, exp2(-15.0)));
return (1 / 255.0) * rgba;
}
Since you didn't deign to give us the exact code you used to create and upload your texture, I can only guess at what you're doing.
You seem to be creating a JavaScript array of floating-point numbers. You then create a Uint8Array, passing that array to the constructor.
According to the WebGL spec (or rather, the spec that the WebGL spec refers to when ostensibly specifying this behavior), the conversion from floats to unsigned bytes happens in one of two ways, based on the destination. If the destination is considered "clamped", then it clamps the number to the destination range, namely [0, 255] for your case. If the destination is not considered "clamped", then it is taken modulo 28. The WebGL "specification" is sufficiently poor that it is not entirely clear whether the construction of Uint8Array is considered clamped or not. Whether clamped or taken modulo 28, the decimal point is chopped off and the integer value stored.
However, when you give this data to OpenWebGL, you told WebGL to interpret the bytes as normalized unsigned integer values. This means that the input values on the range [0, 255] will be accessed by users of the texture as [0, 1] floating point values.
So if your input array had the value 183.45, the value in the Uint8Array would be 183. The value in the texture would be 183/255, or 0.718. If your input value was 0.45, the Uint8Array would hold 0, and the texture result would be 0.0.
Now, because you passed the data as GL_RGBA, that means that every 4 unsigned bytes will be taken as a single texel. So every call to texture will fetch those particular four values (at the given texture coordinate, using the given filtering parameters), thus returning a vec4.
It is not clear what you intend to do with this floating-point data, so it is hard to make suggestions as to how best to pass float data to a shader. However, a general solution would be to use the OES_texture_float extension and actually create a texture that stores floating-point data. Of course, if it isn't available, you'll still have to find a way to do what you want.
BTW, Khronos really should be ashamed of themselves for even calling WebGL a specification. It barely specifies anything; it's just a bunch of references to other specifications, which makes finding the effects of anything exceedingly difficult.
You won't be able to just interpret the 4 unsigned bytes as the bits of a float value (which I assume you want) in a shader (at least not in GLES or WebGL, I think). What you can do is not store the float's bit representation in the 4 ubytes, but the bits of the mantissa (or the fixed point representation). For this you need to know the approximate range of the floats (I'll assume [0,1] here for simplicity, otherwise you have to scale differently, of course):
r = clamp(int(2^8 * f), 0, 255);
g = clamp(int(2^16 * f), 0, 255);
b = clamp(int(2^24 * f), 0, 255); //only have 24 bits of precision anyway
Of course you can also work directly with the mantissa bits. And then in the shader you can just reconstruct it that way, using the fact that the components of the vec4 are all in [0,1]:
f = (v.r) + (v.g / 2^8) + (v.b / 2^16);
Although I'm not sure if this will result in the exact same value, the powers of two should help a bit there.

Random / noise functions for GLSL

As the GPU driver vendors don't usually bother to implement noiseX in GLSL, I'm looking for a "graphics randomization swiss army knife" utility function set, preferably optimised to use within GPU shaders. I prefer GLSL, but code any language will do for me, I'm ok with translating it on my own to GLSL.
Specifically, I'd expect:
a) Pseudo-random functions - N-dimensional, uniform distribution over [-1,1] or over [0,1], calculated from M-dimensional seed (ideally being any value, but I'm OK with having the seed restrained to, say, 0..1 for uniform result distribution). Something like:
float random (T seed);
vec2 random2 (T seed);
vec3 random3 (T seed);
vec4 random4 (T seed);
// T being either float, vec2, vec3, vec4 - ideally.
b) Continous noise like Perlin Noise - again, N-dimensional, +- uniform distribution, with constrained set of values and, well, looking good (some options to configure the appearance like Perlin levels could be useful too). I'd expect signatures like:
float noise (T coord, TT seed);
vec2 noise2 (T coord, TT seed);
// ...
I'm not very much into random number generation theory, so I'd most eagerly go for a pre-made solution, but I'd also appreciate answers like "here's a very good, efficient 1D rand(), and let me explain you how to make a good N-dimensional rand() on top of it..." .
For very simple pseudorandom-looking stuff, I use this oneliner that I found on the internet somewhere:
float rand(vec2 co){
return fract(sin(dot(co, vec2(12.9898, 78.233))) * 43758.5453);
}
You can also generate a noise texture using whatever PRNG you like, then upload this in the normal fashion and sample the values in your shader; I can dig up a code sample later if you'd like.
Also, check out this file for GLSL implementations of Perlin and Simplex noise, by Stefan Gustavson.
It occurs to me that you could use a simple integer hash function and insert the result into a float's mantissa. IIRC the GLSL spec guarantees 32-bit unsigned integers and IEEE binary32 float representation so it should be perfectly portable.
I gave this a try just now. The results are very good: it looks exactly like static with every input I tried, no visible patterns at all. In contrast the popular sin/fract snippet has fairly pronounced diagonal lines on my GPU given the same inputs.
One disadvantage is that it requires GLSL v3.30. And although it seems fast enough, I haven't empirically quantified its performance. AMD's Shader Analyzer claims 13.33 pixels per clock for the vec2 version on a HD5870. Contrast with 16 pixels per clock for the sin/fract snippet. So it is certainly a little slower.
Here's my implementation. I left it in various permutations of the idea to make it easier to derive your own functions from.
/*
static.frag
by Spatial
05 July 2013
*/
#version 330 core
uniform float time;
out vec4 fragment;
// A single iteration of Bob Jenkins' One-At-A-Time hashing algorithm.
uint hash( uint x ) {
x += ( x << 10u );
x ^= ( x >> 6u );
x += ( x << 3u );
x ^= ( x >> 11u );
x += ( x << 15u );
return x;
}
// Compound versions of the hashing algorithm I whipped together.
uint hash( uvec2 v ) { return hash( v.x ^ hash(v.y) ); }
uint hash( uvec3 v ) { return hash( v.x ^ hash(v.y) ^ hash(v.z) ); }
uint hash( uvec4 v ) { return hash( v.x ^ hash(v.y) ^ hash(v.z) ^ hash(v.w) ); }
// Construct a float with half-open range [0:1] using low 23 bits.
// All zeroes yields 0.0, all ones yields the next smallest representable value below 1.0.
float floatConstruct( uint m ) {
const uint ieeeMantissa = 0x007FFFFFu; // binary32 mantissa bitmask
const uint ieeeOne = 0x3F800000u; // 1.0 in IEEE binary32
m &= ieeeMantissa; // Keep only mantissa bits (fractional part)
m |= ieeeOne; // Add fractional part to 1.0
float f = uintBitsToFloat( m ); // Range [1:2]
return f - 1.0; // Range [0:1]
}
// Pseudo-random value in half-open range [0:1].
float random( float x ) { return floatConstruct(hash(floatBitsToUint(x))); }
float random( vec2 v ) { return floatConstruct(hash(floatBitsToUint(v))); }
float random( vec3 v ) { return floatConstruct(hash(floatBitsToUint(v))); }
float random( vec4 v ) { return floatConstruct(hash(floatBitsToUint(v))); }
void main()
{
vec3 inputs = vec3( gl_FragCoord.xy, time ); // Spatial and temporal inputs
float rand = random( inputs ); // Random per-pixel value
vec3 luma = vec3( rand ); // Expand to RGB
fragment = vec4( luma, 1.0 );
}
Screenshot:
I inspected the screenshot in an image editing program. There are 256 colours and the average value is 127, meaning the distribution is uniform and covers the expected range.
Gustavson's implementation uses a 1D texture
No it doesn't, not since 2005. It's just that people insist on downloading the old version. The version that is on the link you supplied uses only 8-bit 2D textures.
The new version by Ian McEwan of Ashima and myself does not use a texture, but runs at around half the speed on typical desktop platforms with lots of texture bandwidth. On mobile platforms, the textureless version might be faster because texturing is often a significant bottleneck.
Our actively maintained source repository is:
https://github.com/ashima/webgl-noise
A collection of both the textureless and texture-using versions of noise is here (using only 2D textures):
http://www.itn.liu.se/~stegu/simplexnoise/GLSL-noise-vs-noise.zip
If you have any specific questions, feel free to e-mail me directly (my email address can be found in the classicnoise*.glsl sources.)
Gold Noise
// Gold Noise ©2015 dcerisano#standard3d.com
// - based on the Golden Ratio
// - uniform normalized distribution
// - fastest static noise generator function (also runs at low precision)
// - use with indicated fractional seeding method.
float PHI = 1.61803398874989484820459; // Φ = Golden Ratio
float gold_noise(in vec2 xy, in float seed){
return fract(tan(distance(xy*PHI, xy)*seed)*xy.x);
}
See Gold Noise in your browser right now!
This function has improved random distribution over the current function in #appas' answer as of Sept 9, 2017:
The #appas function is also incomplete, given there is no seed supplied (uv is not a seed - same for every frame), and does not work with low precision chipsets. Gold Noise runs at low precision by default (much faster).
There is also a nice implementation described here by McEwan and #StefanGustavson that looks like Perlin noise, but "does not require any setup, i.e. not textures nor uniform arrays. Just add it to your shader source code and call it wherever you want".
That's very handy, especially given that Gustavson's earlier implementation, which #dep linked to, uses a 1D texture, which is not supported in GLSL ES (the shader language of WebGL).
After the initial posting of this question in 2010, a lot has changed in the realm of good random functions and hardware support for them.
Looking at the accepted answer from today's perspective, this algorithm is very bad in uniformity of the random numbers drawn from it. And the uniformity suffers a lot depending on the magnitude of the input values and visible artifacts/patterns will become apparent when sampling from it for e.g. ray/path tracing applications.
There have been many different functions (most of them integer hashing) being devised for this task, for different input and output dimensionality, most of which are being evaluated in the 2020 JCGT paper Hash Functions for GPU Rendering. Depending on your needs you could select a function from the list of proposed functions in that paper and simply from the accompanying Shadertoy.
One that isn't covered in this paper but that has served me very well without any noticeably patterns on any input magnitude values is also one that I want to highlight.
Other classes of algorithms use low-discrepancy sequences to draw pseudo-random numbers from, such as the Sobol squence with Owen-Nayar scrambling. Eric Heitz has done some amazing research in this area, as well with his A Low-Discrepancy Sampler that Distributes Monte Carlo Errors as a Blue Noise in Screen Space paper.
Another example of this is the (so far latest) JCGT paper Practical Hash-based Owen Scrambling, which applies Owen scrambling to a different hash function (namely Laine-Karras).
Yet other classes use algorithms that produce noise patterns with desirable frequency spectrums, such as blue noise, that is particularly "pleasing" to the eyes.
(I realize that good StackOverflow answers should provide the algorithms as source code and not as links because those can break, but there are way too many different algorithms nowadays and I intend for this answer to be a summary of known-good algorithms today)
Do use this:
highp float rand(vec2 co)
{
highp float a = 12.9898;
highp float b = 78.233;
highp float c = 43758.5453;
highp float dt= dot(co.xy ,vec2(a,b));
highp float sn= mod(dt,3.14);
return fract(sin(sn) * c);
}
Don't use this:
float rand(vec2 co){
return fract(sin(dot(co.xy ,vec2(12.9898,78.233))) * 43758.5453);
}
You can find the explanation in Improvements to the canonical one-liner GLSL rand() for OpenGL ES 2.0
hash:
Nowadays webGL2.0 is there so integers are available in (w)GLSL.
-> for quality portable hash (at similar cost than ugly float hashes) we can now use "serious" hashing techniques.
IQ implemented some in https://www.shadertoy.com/view/XlXcW4 (and more)
E.g.:
const uint k = 1103515245U; // GLIB C
//const uint k = 134775813U; // Delphi and Turbo Pascal
//const uint k = 20170906U; // Today's date (use three days ago's dateif you want a prime)
//const uint k = 1664525U; // Numerical Recipes
vec3 hash( uvec3 x )
{
x = ((x>>8U)^x.yzx)*k;
x = ((x>>8U)^x.yzx)*k;
x = ((x>>8U)^x.yzx)*k;
return vec3(x)*(1.0/float(0xffffffffU));
}
Just found this version of 3d noise for GPU, alledgedly it is the fastest one available:
#ifndef __noise_hlsl_
#define __noise_hlsl_
// hash based 3d value noise
// function taken from https://www.shadertoy.com/view/XslGRr
// Created by inigo quilez - iq/2013
// License Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
// ported from GLSL to HLSL
float hash( float n )
{
return frac(sin(n)*43758.5453);
}
float noise( float3 x )
{
// The noise function returns a value in the range -1.0f -> 1.0f
float3 p = floor(x);
float3 f = frac(x);
f = f*f*(3.0-2.0*f);
float n = p.x + p.y*57.0 + 113.0*p.z;
return lerp(lerp(lerp( hash(n+0.0), hash(n+1.0),f.x),
lerp( hash(n+57.0), hash(n+58.0),f.x),f.y),
lerp(lerp( hash(n+113.0), hash(n+114.0),f.x),
lerp( hash(n+170.0), hash(n+171.0),f.x),f.y),f.z);
}
#endif
A straight, jagged version of 1d Perlin, essentially a random lfo zigzag.
half rn(float xx){
half x0=floor(xx);
half x1=x0+1;
half v0 = frac(sin (x0*.014686)*31718.927+x0);
half v1 = frac(sin (x1*.014686)*31718.927+x1);
return (v0*(1-frac(xx))+v1*(frac(xx)))*2-1*sin(xx);
}
I also have found 1-2-3-4d perlin noise on shadertoy owner inigo quilez perlin tutorial website, and voronoi and so forth, he has full fast implementations and codes for them.
I have translated one of Ken Perlin's Java implementations into GLSL and used it in a couple projects on ShaderToy.
Below is the GLSL interpretation I did:
int b(int N, int B) { return N>>B & 1; }
int T[] = int[](0x15,0x38,0x32,0x2c,0x0d,0x13,0x07,0x2a);
int A[] = int[](0,0,0);
int b(int i, int j, int k, int B) { return T[b(i,B)<<2 | b(j,B)<<1 | b(k,B)]; }
int shuffle(int i, int j, int k) {
return b(i,j,k,0) + b(j,k,i,1) + b(k,i,j,2) + b(i,j,k,3) +
b(j,k,i,4) + b(k,i,j,5) + b(i,j,k,6) + b(j,k,i,7) ;
}
float K(int a, vec3 uvw, vec3 ijk)
{
float s = float(A[0]+A[1]+A[2])/6.0;
float x = uvw.x - float(A[0]) + s,
y = uvw.y - float(A[1]) + s,
z = uvw.z - float(A[2]) + s,
t = 0.6 - x * x - y * y - z * z;
int h = shuffle(int(ijk.x) + A[0], int(ijk.y) + A[1], int(ijk.z) + A[2]);
A[a]++;
if (t < 0.0)
return 0.0;
int b5 = h>>5 & 1, b4 = h>>4 & 1, b3 = h>>3 & 1, b2= h>>2 & 1, b = h & 3;
float p = b==1?x:b==2?y:z, q = b==1?y:b==2?z:x, r = b==1?z:b==2?x:y;
p = (b5==b3 ? -p : p); q = (b5==b4 ? -q : q); r = (b5!=(b4^b3) ? -r : r);
t *= t;
return 8.0 * t * t * (p + (b==0 ? q+r : b2==0 ? q : r));
}
float noise(float x, float y, float z)
{
float s = (x + y + z) / 3.0;
vec3 ijk = vec3(int(floor(x+s)), int(floor(y+s)), int(floor(z+s)));
s = float(ijk.x + ijk.y + ijk.z) / 6.0;
vec3 uvw = vec3(x - float(ijk.x) + s, y - float(ijk.y) + s, z - float(ijk.z) + s);
A[0] = A[1] = A[2] = 0;
int hi = uvw.x >= uvw.z ? uvw.x >= uvw.y ? 0 : 1 : uvw.y >= uvw.z ? 1 : 2;
int lo = uvw.x < uvw.z ? uvw.x < uvw.y ? 0 : 1 : uvw.y < uvw.z ? 1 : 2;
return K(hi, uvw, ijk) + K(3 - hi - lo, uvw, ijk) + K(lo, uvw, ijk) + K(0, uvw, ijk);
}
I translated it from Appendix B from Chapter 2 of Ken Perlin's Noise Hardware at this source:
https://www.csee.umbc.edu/~olano/s2002c36/ch02.pdf
Here is a public shade I did on Shader Toy that uses the posted noise function:
https://www.shadertoy.com/view/3slXzM
Some other good sources I found on the subject of noise during my research include:
https://thebookofshaders.com/11/
https://mzucker.github.io/html/perlin-noise-math-faq.html
https://rmarcus.info/blog/2018/03/04/perlin-noise.html
http://flafla2.github.io/2014/08/09/perlinnoise.html
https://mrl.nyu.edu/~perlin/noise/
https://rmarcus.info/blog/assets/perlin/perlin_paper.pdf
https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch05.html
I highly recommend the book of shaders as it not only provides a great interactive explanation of noise, but other shader concepts as well.
EDIT:
Might be able to optimize the translated code by using some of the hardware-accelerated functions available in GLSL. Will update this post if I end up doing this.
lygia, a multi-language shader library
If you don't want to copy / paste the functions into your shader, you can also use lygia, a multi-language shader library. It contains a few generative functions like cnoise, fbm, noised, pnoise, random, snoise in both GLSL and HLSL. And many other awesome functions as well. For this to work it:
Relays on #include "file" which is defined by Khronos GLSL standard and suported by most engines and enviroments (like glslViewer, glsl-canvas VS Code pluging, Unity, etc. ).
Example: cnoise
Using cnoise.glsl with #include:
#ifdef GL_ES
precision mediump float;
#endif
uniform vec2 u_resolution;
uniform float u_time;
#include "lygia/generative/cnoise.glsl"
void main (void) {
vec2 st = gl_FragCoord.xy / u_resolution.xy;
vec3 color = vec3(cnoise(vec3(st * 5.0, u_time)));
gl_FragColor = vec4(color, 1.0);
}
To run this example I used glslViewer.
Please see below an example how to add white noise to the rendered texture.
The solution is to use two textures: original and pure white noise, like this one: wiki white noise
private static final String VERTEX_SHADER =
"uniform mat4 uMVPMatrix;\n" +
"uniform mat4 uMVMatrix;\n" +
"uniform mat4 uSTMatrix;\n" +
"attribute vec4 aPosition;\n" +
"attribute vec4 aTextureCoord;\n" +
"varying vec2 vTextureCoord;\n" +
"varying vec4 vInCamPosition;\n" +
"void main() {\n" +
" vTextureCoord = (uSTMatrix * aTextureCoord).xy;\n" +
" gl_Position = uMVPMatrix * aPosition;\n" +
"}\n";
private static final String FRAGMENT_SHADER =
"precision mediump float;\n" +
"uniform sampler2D sTextureUnit;\n" +
"uniform sampler2D sNoiseTextureUnit;\n" +
"uniform float uNoseFactor;\n" +
"varying vec2 vTextureCoord;\n" +
"varying vec4 vInCamPosition;\n" +
"void main() {\n" +
" gl_FragColor = texture2D(sTextureUnit, vTextureCoord);\n" +
" vec4 vRandChosenColor = texture2D(sNoiseTextureUnit, fract(vTextureCoord + uNoseFactor));\n" +
" gl_FragColor.r += (0.05 * vRandChosenColor.r);\n" +
" gl_FragColor.g += (0.05 * vRandChosenColor.g);\n" +
" gl_FragColor.b += (0.05 * vRandChosenColor.b);\n" +
"}\n";
The fragment shared contains parameter uNoiseFactor which is updated on every rendering by main application:
float noiseValue = (float)(mRand.nextInt() % 1000)/1000;
int noiseFactorUniformHandle = GLES20.glGetUniformLocation( mProgram, "sNoiseTextureUnit");
GLES20.glUniform1f(noiseFactorUniformHandle, noiseFactor);
FWIW I had the same questions and I needed it to be implemented in WebGL 1.0, so I couldn't use a few of the examples given in previous answers. I tried the Gold Noise mentioned before, but the use of PHI doesn't really click for me. (distance(xy * PHI, xy) * seed just equals length(xy) * (1.0 - PHI) * seed so I don't see how the magic of PHI should be put to work when it gets directly multiplied by seed?
Anyway, I did something similar just without PHI and instead added some variation at another place, basically I take the tan of the distance between xy and some random point lying outside of the frame to the top right and then multiply with the distance between xy and another such random point lying in the bottom left (so there is no accidental match between these points). Looks pretty decent as far as I can see. Click to generate new frames.
(function main() {
const dim = [512, 512];
twgl.setDefaults({ attribPrefix: "a_" });
const gl = twgl.getContext(document.querySelector("canvas"));
gl.canvas.width = dim[0];
gl.canvas.height = dim[1];
const bfi = twgl.primitives.createXYQuadBufferInfo(gl);
const pgi = twgl.createProgramInfo(gl, ["vs", "fs"]);
gl.canvas.onclick = (() => {
twgl.bindFramebufferInfo(gl, null);
gl.useProgram(pgi.program);
twgl.setUniforms(pgi, {
u_resolution: dim,
u_seed: Array(4).fill().map(Math.random)
});
twgl.setBuffersAndAttributes(gl, pgi, bfi);
twgl.drawBufferInfo(gl, bfi);
});
})();
<script src="https://twgljs.org/dist/4.x/twgl-full.min.js"></script>
<script id="vs" type="x-shader/x-vertex">
attribute vec4 a_position;
attribute vec2 a_texcoord;
void main() {
gl_Position = a_position;
}
</script>
<script id="fs" type="x-shader/x-fragment">
precision highp float;
uniform vec2 u_resolution;
uniform vec2 u_seed[2];
void main() {
float uni = fract(
tan(distance(
gl_FragCoord.xy,
u_resolution * (u_seed[0] + 1.0)
)) * distance(
gl_FragCoord.xy,
u_resolution * (u_seed[1] - 2.0)
)
);
gl_FragColor = vec4(uni, uni, uni, 1.0);
}
</script>
<canvas></canvas>

Resources