GLSL odd even merge sort - sorting

I'm trying to understand the odd-ever merge sort example on the gpugems site but I'm having trouble figuring out some of what they are passing into the uniforms. Here's the shader in it's entirety.
uniform vec3 Param1;
uniform vec3 Param2;
uniform sampler2D Data;
#define OwnPos gl_TexCoord[0]
// contents of the uniform data fields
#define TwoStage Param1.x
#define Pass_mod_Stage Param1.y
#define TwoStage_PmS_1 Param1.z
#define Width Param2.x
#define Height Param2.y
#define Pass Param2.z
void main(void){
// get self
vec4 self = texture2D(Data, OwnPos.xy);
float i = floor(OwnPos.x * Width) + floor(OwnPos.y * Height) * Width;
// my position within the range to merge
float j = floor(mod(i, TwoStage));
float compare;
if ( (j < Pass_mod_Stage) || (j > TwoStage_PmS_1) )
// must copy -> compare with self
compare = 0.0;
else
// must sort
if ( mod((j + Pass_mod_Stage) / Pass, 2.0) < 1.0)
// we are on the left side -> compare with partner on the right
compare = 1.0;
else
// we are on the right side -> compare with partner on the left
compare = -1.0;
// get the partner
float adr = i + compare * Pass;
vec4 partner = texture2D(Data, vec2(floor(mod(adr, Width)) / Width, floor(adr / Width) / Height));
// on the left it's a < operation; on the right it's a >= operation
gl_FragColor = (self.x * compare < partner.x * compare) ? self : partner;
}
The part that's tripping me up is figuring out what they are assigning to Param1 and Param2.z.
Param2.x and Param2.y are just width and height of the image. Is the pass variable just an incrementing number each time through the loop?
Param1.x, Param1.y, and Param1.z have me completely stumped. Is there supposed to be something going on in the CPU side of this program that they are not including here?
Any help or clarity would be greatly appreciated! Thank you

The code for this example is available here.
The line 298 of the main.cpp file is the one that you need:
glUniform3fARB(oddevenMergeSort.getUniformLocation("Param1"),
float(pstage+pstage),
float(ppass%pstage),
float((pstage+pstage)-(ppass%pstage)-1)
);
glUniform3fARB(oddevenMergeSort.getUniformLocation("Param2"),
float(width),
float(height),
float(ppass)
);
With int ppass = 1<<pass; and int pstage = 1<<stage;

Related

GLSL uv lookup and precision with FBO / RenderTarget in Three.js

My application is coded in Javascript + Three.js / WebGL + GLSL. I have 200 curves, each one made of 85 points. To animate the curves I add a new point and remove the last.
So I made a positions shader that stores the new positions onto a texture (1) and the lines shader that writes the positions for all curves on another texture (2).
The goal is to use textures as arrays: I know the first and last index of a line, so I need to convert those indices to uv coordinates.
I use FBOHelper to debug FBOs.
1) This 1D texture contains the new points for each curve (200 in total): positionTexture
2) And these are the 200 curves, with all their points, one after the other: linesTexture
The black parts are the BUG here. Those texels shouldn't be black.
How does it work: at each frame the shader looks up the new point for each line in the positionTexture and updates the linesTextures accordingly, with a for loop like this:
#define LINES_COUNT = 200
#define LINE_POINTS = 85 // with 100 it works!!!
// Then in main()
vec2 uv = gl_FragCoord.xy / resolution.xy;
for (float i = 0.0; i < LINES_COUNT; i += 1.0) {
float startIdx = i * LINE_POINTS; // line start index
float endIdx = beginIdx + LINE_POINTS - 1.0; // line end index
vec2 lastCell = getUVfromIndex(endIdx); // last uv coordinate reserved for current line
if (match(lastCell, uv)) {
pos = texture2D( positionTexture, vec2((i / LINES_COUNT) + minFloat, 0.0)).xyz;
} else if (index >= startIdx && index < endIdx) {
pos = texture2D( lineTexture, getNextUV(uv) ).xyz;
}
}
This works, but it's slightly buggy when I have many lines (150+): likely a precision problem. I'm not sure if the functions I wrote to look up the textures are right. I wrote functions like getNextUV(uv) to get the value from the next index (converted to uv coordinates) and copy to the previous. Or match(xy, uv) to know if the current fragment is the texel I want.
I though I could simply use the classic formula:
index = uv.y * width + uv.x
But it's more complicated than that. For example match():
// Wether a point XY is within a UV coordinate
float size = 132.0; // width and height of texture
float unit = 1.0 / size;
float minFloat = unit / size;
bool match(vec2 point, vec2 uv) {
vec2 p = point;
float x = floor(p.x / unit) * unit;
float y = floor(p.y / unit) * unit;
return x <= uv.x && x + unit > uv.x && y <= uv.y && y + unit > uv.y;
}
Or getUVfromIndex():
vec2 getUVfromIndex(float index) {
float row = floor(index / size); // Example: 83.56 / 10 = 8
float col = index - (row * size); // Example: 83.56 - (8 * 10) = 3.56
col = col / size + minFloat; // u = 0.357
row = row / size + minFloat; // v = 0.81
return vec2(col, row);
}
Can someone explain what's the most efficient way to lookup values in a texture, by getting a uv coordinate from index value?
Texture coordinates go from the edge of pixels not the centers so your formula to compute a UV coordinates needs to be
u = (xPixelCoord + .5) / widthOfTextureInPixels;
v = (yPixelCoord + .5) / heightOfTextureInPixels;
So I'm guessing you want getUVfromIndex to be
uniform vec2 sizeOfTexture; // allow texture to be any size
vec2 getUVfromIndex(float index) {
float widthOfTexture = sizeOfTexture.x;
float col = mod(index, widthOfTexture);
float row = floor(index / widthOfTexture);
return (vec2(col, row) + .5) / sizeOfTexture;
}
Or, based on some other experience with math issues in shaders you might need to fudge index
uniform vec2 sizeOfTexture; // allow texture to be any size
vec2 getUVfromIndex(float index) {
float fudgedIndex = index + 0.1;
float widthOfTexture = sizeOfTexture.x;
float col = mod(fudgedIndex, widthOfTexture);
float row = floor(fudgedIndex / widthOfTexture);
return (vec2(col, row) + .5) / sizeOfTexture;
}
If you're in WebGL2 you can use texelFetch which takes integer pixel coordinates to get a value from a texture

GLSL for loop for grid neighbor calculation bug

For a little background this is for doing particle collisions with lookup textures on the GPU. I read the position texture with javascript and create a grid texture that contains the particles that are in the corresponding grid cell. The working example that is mentioned in the post can be viewed here: https://pacific-hamlet-84784.herokuapp.com/
The reason I want the buckets system is that it will allow me to do much fewer checks and the number of checks wouldn't increase with the number of particles.
For the actual problem description:
I am attempting to read from a lookup texture centered around a pixel (lets say i have a texture that is 10x10, and I want to read the pixels around (4,2), i would read
(3,1),(3,2)(3,3)
(4,1),(4,2)(4,3)
(5,1),(5,2)(5,3)
The loop is a little more complicated but that is the general idea. If I make the loop look like the following
float xcenter = 5.0;
float ycenter = 5.0;
for(float i = -5.0; i < 5.0; i++){
for(float j = -5.0; j < 5.0; j++){
}
}
It works (however it goes over all of the particles which defeats the purpose), however if I calculate the value dynamically (which is what I need), then I get really bizarre behavior. Is this a problem with GLSL or a problem with my code? I output the values to an image and read the pixel values and they all appear to be within the right range. The problem is coming from using the for loop variables (i,j) to change a bucket index that is calculated outside of the loop, and use that variable to index a texture.
The entire shader code can be seen here:
(if I remove the hard coded 70, and remove the comments it breaks, but all of those values are between 0 and 144. This is where I am confused. I feel like this code should still work fine.).
uniform sampler2D pos;
uniform sampler2D buckets;
uniform vec2 res;
uniform vec2 screenSize;
uniform float size;
uniform float bounce;
const float width = &WIDTH;
const float height = &HEIGHT;
const float cellSize = &CELLSIZE;
const float particlesPerCell = &PPC;
const float bucketsWidth = &BW;
const float bucketsHeight = &BH;
$rand
void main(){
vec2 uv = gl_FragCoord.xy / res;
vec4 posi = texture2D( pos , uv );
float x = posi.x;
float y = posi.y;
float z = posi.z;
float target = 1.0 * size;
float x_bkt = floor( (x + (screenSize.x/2.0) )/cellSize);
float y_bkt = floor( (y + (screenSize.y/2.0) )/cellSize);
float x_bkt_ind_start = 70.0; //x_bkt * particlesPerCell;
float y_bkt_ind_start =70.0; //y_bkt * particlesPerCell;
//this is the code that is acting weirdly
for(float j = -144.0 ; j < 144.0; j++){
for(float i = -144.0 ; i < 144.0; i++){
float x_bkt_ind = (x_bkt_ind_start + i)/bucketsWidth;
float y_bkt_ind = (y_bkt_ind_start + j)/bucketsHeight;
vec4 ind2 = texture2D( buckets , vec2(x_bkt_ind,y_bkt_ind) );
if( abs(ind2.z - 1.0) > 0.00001 || x_bkt_ind < 0.0 || x_bkt_ind > 1.0 || y_bkt_ind < 0.0 || y_bkt_ind > 1.0 ){
continue;
}
vec4 pos2 = texture2D( pos , vec2(ind2.xy)/res );
vec2 diff = posi.xy - pos2.xy;
float dist = length(diff);
vec2 uvDiff = ind2.xy - gl_FragCoord.xy ;
float uvDist = abs(length(uvDiff));
if(dist <= target && uvDist >= 0.5){
float factor = (dist-target)/dist;
x = x - diff.x * factor * 0.5;
y = y - diff.y * factor * 0.5;
}
}
}
gl_FragColor = vec4( x, y, x_bkt_ind_start , y_bkt_ind_start);
}
EDIT:
To make my problem clear, what is happening is that when I do the first texture lookup, I get the position of the particle:
vec2 uv = gl_FragCoord.xy / res;
vec4 posi = texture2D( pos , uv );
After, I calculate the bucket that the particle is in:
float x_bkt = floor( (x + (screenSize.x/2.0) )/cellSize);
float y_bkt = floor( (y + (screenSize.y/2.0) )/cellSize);
float x_bkt_ind_start = x_bkt * particlesPerCell;
float y_bkt_ind_start = y_bkt * particlesPerCell;
All of this is correct. Like I am getting the correct values and if I set these as the output values of the shader and read the pixels they are the correct values. I also changed my implementation a little and this code works fine.
In order to text the for loop, I replaced the pixel lookup coordinates in the grid bucket by the pixel positions. I adapted the code and it works fine, however I have to recalculate the buckets multiple times per frame so the code is not very efficient. If instead of storing the pixel positions I store the uv coordinates of the pixels and then do a lookup using those uv positions:
//get the texture coordinate that is offset by the for loop
float x_bkt_ind = (x_bkt_ind_start + i)/bucketsWidth;
float y_bkt_ind = (y_bkt_ind_start + j)/bucketsHeight;
//use the texture coordinates to get the stored texture coordinate in the actual position table from the bucket table
vec4 ind2 = texture2D( buckets , vec2(x_bkt_ind,y_bkt_ind) );
and then I actually get the position
vec4 pos2 = texture2D( pos , vec2(ind2.xy)/res );
this pos2 value will be wrong. I am pretty sure that the ind2 value is correct because if instead of storing a pixel coordinate in that bucket table I store position values and remove the second texture lookup, the code runs fine. But using the second lookup causes the code to break.
In the original post if I set the bucket to be any value, lets say the middle of the texture, and iterate over every possible bucket coordinate around the pixel, it works fine. However if I calculate the bucket position and iterate over every pixel it does not. I wonder if it has to do with the say glsl compiles the shaders and that some sort of optimization it is making is causing the double texture lookups to break in the for look. Or it is just a mistake in my code. I was able to get the single texture lookup in a for loop working when I just stored position values in the bucket texture.

Optimize WebGL shader?

I wrote the following shader to render a pattern with a bunch of concentric circles. Eventually I want to have each rotating sphere be a light emitter to create something along these lines.
Of course right now I'm just doing the most basic part to render the different objects.
Unfortunately the shader is incredibly slow (16fps full screen on a high-end macbook). I'm pretty sure this is due to the numerous for loops and branching that I have in the shader. I'm wondering how I can pull off the geometry I'm trying to achieve in a more performance optimized way:
EDIT: you can run the shader here: https://www.shadertoy.com/view/lssyRH
One obvious optimization I am missing is that currently all the fragments are checked against the entire 24 surrounding circles. It would be pretty quick and easy to just discard these checks entirely by checking if the fragment intersects the outer bounds of the diagram. I guess I'm just trying to get a handle on how the best practice is of doing something like this.
#define N 10
#define M 5
#define K 24
#define M_PI 3.1415926535897932384626433832795
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
float aspectRatio = iResolution.x / iResolution.y;
float h = 1.0;
float w = aspectRatio;
vec2 uv = vec2(fragCoord.x / iResolution.x * aspectRatio, fragCoord.y / iResolution.y);
float radius = 0.01;
float orbitR = 0.02;
float orbiterRadius = 0.005;
float centerRadius = 0.002;
float encloseR = 2.0 * orbitR;
float encloserRadius = 0.002;
float spacingX = (w / (float(N) + 1.0));
float spacingY = h / (float(M) + 1.0);
float x = 0.0;
float y = 0.0;
vec4 totalLight = vec4(0.0, 0.0, 0.0, 1.0);
for (int i = 0; i < N; i++) {
for (int j = 0; j < M; j++) {
// compute the center of the diagram
vec2 center = vec2(spacingX * (float(i) + 1.0), spacingY * (float(j) + 1.0));
x = center.x + orbitR * cos(iGlobalTime);
y = center.y + orbitR * sin(iGlobalTime);
vec2 bulb = vec2(x,y);
if (length(uv - center) < centerRadius) {
// frag intersects white center marker
fragColor = vec4(1.0);
return;
} else if (length(uv - bulb) < radius) {
// intersects rotating "light"
fragColor = vec4(uv,0.5+0.5*sin(iGlobalTime),1.0);
return;
} else {
// intersects one of the enclosing 24 cylinders
for(int k = 0; k < K; k++) {
float theta = M_PI * 2.0 * float(k)/ float(K);
x = center.x + cos(theta) * encloseR;
y = center.y + sin(theta) * encloseR;
vec2 encloser = vec2(x,y);
if (length(uv - encloser) < encloserRadius) {
fragColor = vec4(uv,0.5+0.5*sin(iGlobalTime),1.0);
return;
}
}
}
}
}
}
Keeping in mind that you want to optimize the fragment shader, and only the fragment shader:
Move the sin(iGlobalTime) and cos(iGlobalTime) out of the loops, these remain static over the whole draw call so no need to recalculate them every loop iteration.
GPUs employ vectorized instruction sets (SIMD) where possible, take advantage of that. You're wasting lots of cycles by doing multiple scalar ops where you could use a single vector instruction(see annotated code)
[Three years wiser me here: I'm not really sure if this statement is true in regards to how modern GPUs process the instructions, however it certainly does help readability and maybe even give a hint or two to the compiler]
Do your radius checks squared, save that sqrt(length) for when you really need it
Replace float casts of constants(your loop limits) with a float constant(intelligent shader compilers will already do this, not something to count on though)
Don't have undefined behavior in your shader(not writing to gl_FragColor)
Here is an optimized and annotated version of your shader(still containing that undefined behavior, just like the one you provided). Annotation is in the form of:
// annotation
// old code, if any
new code
#define N 10
// define float constant N
#define fN 10.
#define M 5
// define float constant M
#define fM 5.
#define K 24
// define float constant K
#define fK 24.
#define M_PI 3.1415926535897932384626433832795
// predefine 2 times PI
#define M_PI2 6.28318531
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
float aspectRatio = iResolution.x / iResolution.y;
// we dont need these separate
// float h = 1.0;
// float w = aspectRatio;
// use vector ops(2 divs 1 mul => 1 div 1 mul)
// vec2 uv = vec2(fragCoord.x / iResolution.x * aspectRatio, fragCoord.y / iResolution.y);
vec2 uv = fragCoord.xy / iResolution.xy;
uv.x *= aspectRatio;
// most of the following declarations should be predefined or marked as "const"...
float radius = 0.01;
// precalc squared radius
float radius2 = radius*radius;
float orbitR = 0.02;
float orbiterRadius = 0.005;
float centerRadius = 0.002;
// precalc squared center radius
float centerRadius2 = centerRadius * centerRadius;
float encloseR = 2.0 * orbitR;
float encloserRadius = 0.002;
// precalc squared encloser radius
float encloserRadius2 = encloserRadius * encloserRadius;
// Use float constants and vector ops here(2 casts 2 adds 2 divs => 1 add 1 div)
// float spacingX = w / (float(N) + 1.0);
// float spacingY = h / (float(M) + 1.0);
vec2 spacing = vec2(aspectRatio, 1.0) / (vec2(fN, fM)+1.);
// calc sin and cos of global time
// saves N*M(sin,cos,2 muls)
vec2 stct = vec2(sin(iGlobalTime), cos(iGlobalTime));
vec2 orbit = orbitR * stct;
// not needed anymore
// float x = 0.0;
// float y = 0.0;
// was never used
// vec4 totalLight = vec4(0.0, 0.0, 0.0, 1.0);
for (int i = 0; i < N; i++) {
for (int j = 0; j < M; j++) {
// compute the center of the diagram
// Use vector ops
// vec2 center = vec2(spacingX * (float(i) + 1.0), spacingY * (float(j) + 1.0));
vec2 center = spacing * (vec2(i,j)+1.0);
// Again use vector opts, use precalced time trig(orbit = orbitR * stct)
// x = center.x + orbitR * cos(iGlobalTime);
// y = center.y + orbitR * sin(iGlobalTime);
// vec2 bulb = vec2(x,y);
vec2 bulb = center + orbit;
// calculate offsets
vec2 centerOffset = uv - center;
vec2 bulbOffset = uv - bulb;
// use squared length check
// if (length(uv - center) < centerRadius) {
if (dot(centerOffset, centerOffset) < centerRadius2) {
// frag intersects white center marker
fragColor = vec4(1.0);
return;
// use squared length check
// } else if (length(uv - bulb) < radius) {
} else if (dot(bulbOffset, bulbOffset) < radius2) {
// Use precalced sin global time in stct.x
// intersects rotating "light"
fragColor = vec4(uv,0.5+0.5*stct.x,1.0);
return;
} else {
// intersects one of the enclosing 24 cylinders
for(int k = 0; k < K; k++) {
// use predefined 2*PI and float K
float theta = M_PI2 * float(k) / fK;
// Use vector ops(2 muls 2 adds => 1 mul 1 add)
// x = center.x + cos(theta) * encloseR;
// y = center.y + sin(theta) * encloseR;
// vec2 encloser = vec2(x,y);
vec2 encloseOffset = uv - (center + vec2(cos(theta),sin(theta)) * encloseR);
if (dot(encloseOffset,encloseOffset) < encloserRadius2) {
fragColor = vec4(uv,0.5+0.5*stct.x,1.0);
return;
}
}
}
}
}
}
I did a little more thinking ... I realized the best way to optimize it is to actually change the logic so that before doing intersection tests on the small circles it checks the bounds of the group of circles. This got it to run at 60fps:
Example here:
https://www.shadertoy.com/view/lssyRH

Assimp animation bone transformation

Recently I'm working on bone animation import, so I made a 3d minecraft-like model with some IK technique to test Assimp animation import. Ouput format is COLLADA(*.dae),and the tool I used is Blender. On the programming side, my enviroment is opengl/glm/assimp. I think these information for my problem is enough.One thing, the animation of the model, I just record 7 unmove keyframe for testing assimp animation.
First, I guess my transformation except local transform part is correct, so let the function only return glm::mat4(1.0f), and the result show the bind pose(not sure) model. (see below image)
Second, Turn back the value glm::mat4(1.0f) to bone->localTransform = transform * scaling * glm::mat4(1.0f);, then the model deform. (see below image)
Test image and model in blender:
(bone->localTransform = glm::mat4(1.0f) * scaling * rotate; : this image is under ground :( )
The code here:
void MeshModel::UpdateAnimations(float time, std::vector<Bone*>& bones)
{
for each (Bone* bone in bones)
{
glm::mat4 rotate = GetInterpolateRotation(time, bone->rotationKeys);
glm::mat4 transform = GetInterpolateTransform(time, bone->transformKeys);
glm::mat4 scaling = GetInterpolateScaling(time, bone->scalingKeys);
//bone->localTransform = transform * scaling * glm::mat4(1.0f);
//bone->localTransform = glm::mat4(1.0f) * scaling * rotate;
//bone->localTransform = glm::translate(glm::mat4(1.0f), glm::vec3(0.5f));
bone->localTransform = glm::mat4(1.0f);
}
}
void MeshModel::UpdateBone(Bone * bone)
{
glm::mat4 parentTransform = bone->getParentTransform();
bone->nodeTransform = parentTransform
* bone->transform // assimp_node->mTransformation
* bone->localTransform; // T S R matrix
bone->finalTransform = globalInverse
* bone->nodeTransform
* bone->inverseBindPoseMatrix; // ai_mesh->mBones[i]->mOffsetMatrix
for (int i = 0; i < (int)bone->children.size(); i++) {
UpdateBone(bone->children[i]);
}
}
glm::mat4 Bone::getParentTransform()
{
if (this->parent != nullptr)
return parent->nodeTransform;
else
return glm::mat4(1.0f);
}
glm::mat4 MeshModel::GetInterpolateRotation(float time, std::vector<BoneKey>& keys)
{
// we need at least two values to interpolate...
if ((int)keys.size() == 0) {
return glm::mat4(1.0f);
}
if ((int)keys.size() == 1) {
return glm::mat4_cast(keys[0].rotation);
}
int rotationIndex = FindBestTimeIndex(time, keys);
int nextRotationIndex = (rotationIndex + 1);
assert(nextRotationIndex < (int)keys.size());
float DeltaTime = (float)(keys[nextRotationIndex].time - keys[rotationIndex].time);
float Factor = (time - (float)keys[rotationIndex].time) / DeltaTime;
if (Factor < 0.0f)
Factor = 0.0f;
if (Factor > 1.0f)
Factor = 1.0f;
assert(Factor >= 0.0f && Factor <= 1.0f);
const glm::quat& startRotationQ = keys[rotationIndex].rotation;
const glm::quat& endRotationQ = keys[nextRotationIndex].rotation;
glm::quat interpolateQ = glm::lerp(endRotationQ, startRotationQ, Factor);
interpolateQ = glm::normalize(interpolateQ);
return glm::mat4_cast(interpolateQ);
}
glm::mat4 MeshModel::GetInterpolateTransform(float time, std::vector<BoneKey>& keys)
{
// we need at least two values to interpolate...
if ((int)keys.size() == 0) {
return glm::mat4(1.0f);
}
if ((int)keys.size() == 1) {
return glm::translate(glm::mat4(1.0f), keys[0].vector);
}
int translateIndex = FindBestTimeIndex(time, keys);
int nextTranslateIndex = (translateIndex + 1);
assert(nextTranslateIndex < (int)keys.size());
float DeltaTime = (float)(keys[nextTranslateIndex].time - keys[translateIndex].time);
float Factor = (time - (float)keys[translateIndex].time) / DeltaTime;
if (Factor < 0.0f)
Factor = 0.0f;
if (Factor > 1.0f)
Factor = 1.0f;
assert(Factor >= 0.0f && Factor <= 1.0f);
const glm::vec3& startTranslate = keys[translateIndex].vector;
const glm::vec3& endTrabslate = keys[nextTranslateIndex].vector;
glm::vec3 delta = endTrabslate - startTranslate;
glm::vec3 resultVec = startTranslate + delta * Factor;
return glm::translate(glm::mat4(1.0f), resultVec);
}
The code idea is referenced from Matrix calculations for gpu skinning and Skeletal Animation With Assimp.
Overall, I fectch all the information from assimp to MeshModel and save it to the bone structure, so I think the information is alright?
The last thing, my vertex shader code:
#version 330 core
#define MAX_BONES_PER_VERTEX 4
in vec3 position;
in vec2 texCoord;
in vec3 normal;
in ivec4 boneID;
in vec4 boneWeight;
const int MAX_BONES = 100;
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;
uniform mat4 boneTransform[MAX_BONES];
out vec3 FragPos;
out vec3 Normal;
out vec2 TexCoords;
out float Visibility;
const float density = 0.007f;
const float gradient = 1.5f;
void main()
{
mat4 boneTransformation = boneTransform[boneID[0]] * boneWeight[0];
boneTransformation += boneTransform[boneID[1]] * boneWeight[1];
boneTransformation += boneTransform[boneID[2]] * boneWeight[2];
boneTransformation += boneTransform[boneID[3]] * boneWeight[3];
vec3 usingPosition = (boneTransformation * vec4(position, 1.0)).xyz;
vec3 usingNormal = (boneTransformation * vec4(normal, 1.0)).xyz;
vec4 viewPos = view * model * vec4(usingPosition, 1.0);
gl_Position = projection * viewPos;
FragPos = vec3(model * vec4(usingPosition, 1.0f));
Normal = mat3(transpose(inverse(model))) * usingNormal;
TexCoords = texCoord;
float distance = length(viewPos.xyz);
Visibility = exp(-pow(distance * density, gradient));
Visibility = clamp(Visibility, 0.0f, 1.0f);
}
If my question above, lack of code or describe vaguely, please let me know, Thanks!
Edit:(1)
In additional, my bone information like this(code fetching part):
for (int i = 0; i < (int)nodeAnim->mNumPositionKeys; i++)
{
BoneKey key;
key.time = nodeAnim->mPositionKeys[i].mTime;
aiVector3D vec = nodeAnim->mPositionKeys[i].mValue;
key.vector = glm::vec3(vec.x, vec.y, vec.z);
currentBone->transformKeys.push_back(key);
}
had some transformation vector, so my code above glm::mat4 transform = GetInterpolateTransform(time, bone->transformKeys);,Absloutely, get the same value from it. I'm not sure I made a nomove keyframe animation that provide the transform values is true or not (of course it has 7 keyframe).
A keyframe contents like this(debug on head bone):
7 different keyframe, same vector value.
Edit:(2)
If you want to test my dae file, I put it in jsfiddle, come and take it :). Another thing, in Unity my file work correctly, so I think maybe not my local transform occurs the problem, it seems the problem could be some other like parentTransform or bone->transform...etc? I aslo add local transform matrix with all bone, But can not figure out why COLLADA contains these value for my unmove animation...
For amounts of testing, and finally found the problem is the UpdateBone() part.
Before I point out my problem, I need to say the series of matrix multiplication let me confuse, but when I found the solution, it just make me totally (maybe just 90%) realize all the matrix doing.
The problem comes from the article,Matrix calculations for gpu skinning. I assumed the answer code is absolutely right and don't think any more about the matrix should be used. Thus, misusing matrix terribly transfer my look into the local transform matrix. Back to the result image in my question section is bind pose when I change the local transform matrix to return glm::mat4(1.0f).
So the question is why the changed make the bind pose? I assumed the problem must be local transform in bone space, but I'm wrong. Before I give the answer, look at the code below:
void MeshModel::UpdateBone(Bone * bone)
{
glm::mat4 parentTransform = bone->getParentTransform();
bone->nodeTransform = parentTransform
* bone->transform // assimp_node->mTransformation
* bone->localTransform; // T S R matrix
bone->finalTransform = globalInverse
* bone->nodeTransform
* bone->inverseBindPoseMatrix; // ai_mesh->mBones[i]->mOffsetMatrix
for (int i = 0; i < (int)bone->children.size(); i++) {
UpdateBone(bone->children[i]);
}
}
And I make the change as below:
void MeshModel::UpdateBone(Bone * bone)
{
glm::mat4 parentTransform = bone->getParentTransform();
if (boneName == "Scene" || boneName == "Armature")
{
bone->nodeTransform = parentTransform
* bone->transform // when isn't bone node, using assimp_node->mTransformation
* bone->localTransform; //this is your T * R matrix
}
else
{
bone->nodeTransform = parentTransform // This retrieve the transformation one level above in the tree
* bone->localTransform; //this is your T * R matrix
}
bone->finalTransform = globalInverse // scene->mRootNode->mTransformation
* bone->nodeTransform //defined above
* bone->inverseBindPoseMatrix; // ai_mesh->mBones[i]->mOffsetMatrix
for (int i = 0; i < (int)bone->children.size(); i++) {
UpdateBone(bone->children[i]);
}
}
I don't know what the assimp_node->mTransformation give me before, only the description "The transformation relative to the node's parent" in the assimp documentation. For some testing, I found that the mTransformation is the bind pose matrix which the current node relative to parent if I use these on bone node. Let me give a picture that captured the matrix on head bone.
The left part is the transform which is fetched from assimp_node->mTransformation.The right part is my unmove animation's localTransform which is calculated by the keys from nodeAnim->mPositionKeys, nodeAnim->mRotationKeys and nodeAnim->mScalingKeys.
Look back what I did, I made a bind pose tranformation twice, so the image in my question section look just seperate but not spaghetti :)
On the last, let me show what I did before the unmove animation testing and correct animation result.
(For everyone, If my concept is wrong , please point me out! Thx.)

How do I convert a vec4 rgba value to a float?

I packed some float data in a texture as an unsigned_byte, my only option in webgl. Now I would like unpack it in the vertex shader. When I sample a pixel I get a vec4 which is really one of my floats. How do I convert from the vec4 to a float?
The following code is specifically for the iPhone 4 GPU using OpenGL ES 2.0. I have no experience with WebGL so I cant claim to know how the code will work in that context. Furthermore the main problem here is that highp float is not 32 bits but is instead 24 bit.
My solution is for fragment shaders - I didnt try it in the vertex shader but it shouldnt be any different. In order to use the you will need to get the RGBA texel from a sampler2d uniform and make sure that the values of each R,G,B and A channels are between 0.0 and 255.0 . This is easy to achieve as follows:
highp vec4 rgba = texture2D(textureSamplerUniform, texcoordVarying)*255.0;
You should be aware though that the endianess of your machine will dictate the correct order of your bytes. The above code assumes that floats are stored in big-endian order. If you see your results are wrong then just swap the order of the data by writing
rgba.rgba=rgba.abgr;
immediately after the line where you set it. Alternatively swap the indices on rgba. I think the above line is more intutive though and less prone to careless errors.
I am not sure if it works for all given input. I tested for a large range of numbers and found that decode32 and encode32 are NOT exact inverses. Ive also left out the code I used to test it.
#pragma STDGL invariant(all)
highp vec4 encode32(highp float f) {
highp float e =5.0;
highp float F = abs(f);
highp float Sign = step(0.0,-f);
highp float Exponent = floor(log2(F));
highp float Mantissa = (exp2(- Exponent) * F);
Exponent = floor(log2(F) + 127.0) + floor(log2(Mantissa));
highp vec4 rgba;
rgba[0] = 128.0 * Sign + floor(Exponent*exp2(-1.0));
rgba[1] = 128.0 * mod(Exponent,2.0) + mod(floor(Mantissa*128.0),128.0);
rgba[2] = floor(mod(floor(Mantissa*exp2(23.0 -8.0)),exp2(8.0)));
rgba[3] = floor(exp2(23.0)*mod(Mantissa,exp2(-15.0)));
return rgba;
}
highp float decode32(highp vec4 rgba) {
highp float Sign = 1.0 - step(128.0,rgba[0])*2.0;
highp float Exponent = 2.0 * mod(rgba[0],128.0) + step(128.0,rgba[1]) - 127.0;
highp float Mantissa = mod(rgba[1],128.0)*65536.0 + rgba[2]*256.0 +rgba[3] + float(0x800000);
highp float Result = Sign * exp2(Exponent) * (Mantissa * exp2(-23.0 ));
return Result;
}
void main()
{
highp float result;
highp vec4 rgba=encode32(-10.01);
result = decode32(rgba);
}
Here are some links on IEEE precision I found useful. Link1. Link2. Link3.
Twerdster posted some excellent code in his answer. So all credit go to him. I post this new answer, since comments don't allow for nice syntax colored code blocks, and i wanted to share some code. But if you like the code, please upvote Twerdster original answer.
In Twerdster previous post he mentioned that the decode and encode might not work for all values.
To further test this, and validate the result i made a java program. While porting the code i tried to stayed as close as possible to the shader code (therefore i implemented some helper functions).
Note: I also use a store/load function to similate what happens when you write/read from a texture.
I found out that:
You need a special case for the zero
You might also need special case for infinity, but i did not implement that to keep the shader simple (eg: faster)
Because of rounding errors sometimes the result was wrong therefore:
subtract 1 from exponent when because of rounding the mantissa is not properly normalised (eg mantissa < 1)
Change float Mantissa = (exp2(- Exponent) * F); to float Mantissa = F/exp2(Exponent); to reduce precision errors
Use float Exponent = floor(log2(F)); to calc exponent. (simplified by new mantissa check)
Using these small modifications i got equal output on almost all inputs, and got only small errors between the original and encoded/decoded value when things do go wrong, while in Twerdster's original implementation rounding errors often resulted in the wrong exponent (thus the result being off by factor two).
Please note that this is a Java test application which i wrote to test the algorithm. I hope this will also work when ported to the GPU. If anybody tries to run it on a GPU, please leave a comment with your experience.
And for the code with a simple test to try different numbers until it failes.
import java.io.PrintStream;
import java.util.Random;
public class BitPacking {
public static float decode32(float[] v)
{
float[] rgba = mult(255, v);
float sign = 1.0f - step(128.0f,rgba[0])*2.0f;
float exponent = 2.0f * mod(rgba[0],128.0f) + step(128.0f,rgba[1]) - 127.0f;
if(exponent==-127)
return 0;
float mantissa = mod(rgba[1],128.0f)*65536.0f + rgba[2]*256.0f +rgba[3] + ((float)0x800000);
return sign * exp2(exponent-23.0f) * mantissa ;
}
public static float[] encode32(float f) {
float F = abs(f);
if(F==0){
return new float[]{0,0,0,0};
}
float Sign = step(0.0f,-f);
float Exponent = floor(log2(F));
float Mantissa = F/exp2(Exponent);
if(Mantissa < 1)
Exponent -= 1;
Exponent += 127;
float[] rgba = new float[4];
rgba[0] = 128.0f * Sign + floor(Exponent*exp2(-1.0f));
rgba[1] = 128.0f * mod(Exponent,2.0f) + mod(floor(Mantissa*128.0f),128.0f);
rgba[2] = floor(mod(floor(Mantissa*exp2(23.0f -8.0f)),exp2(8.0f)));
rgba[3] = floor(exp2(23.0f)*mod(Mantissa,exp2(-15.0f)));
return mult(1/255.0f, rgba);
}
//shader build-in's
public static float exp2(float x){
return (float) Math.pow(2, x);
}
public static float[] step(float edge, float[] x){
float[] result = new float[x.length];
for(int i=0; i<x.length; i++)
result[i] = x[i] < edge ? 0.0f : 1.0f;
return result;
}
public static float step(float edge, float x){
return x < edge ? 0.0f : 1.0f;
}
public static float mod(float x, float y){
return x-y * floor(x/y);
}
public static float floor(float x){
return (float) Math.floor(x);
}
public static float pow(float x, float y){
return (float)Math.pow(x, y);
}
public static float log2(float x)
{
return (float) (Math.log(x)/Math.log(2));
}
public static float log10(float x)
{
return (float) (Math.log(x)/Math.log(10));
}
public static float abs(float x)
{
return (float)Math.abs(x);
}
public static float log(float x)
{
return (float)Math.log(x);
}
public static float exponent(float x)
{
return floor((float)(Math.log(x)/Math.log(10)));
}
public static float mantissa(float x)
{
return floor((float)(Math.log(x)/Math.log(10)));
}
//shorter matrix multiplication
private static float[] mult(float scalar, float[] w){
float[] result = new float[4];
for(int i=0; i<4; i++)
result[i] = scalar * w[i];
return result;
}
//simulate storage and retrieval in 4-channel/8-bit texture
private static float[] load(int[] v)
{
return new float[]{v[0]/255f, v[1]/255f, v[2]/255f, v[3]/255f};
}
private static int[] store(float[] v)
{
return new int[]{((int) (v[0]*255))& 0xff, ((int) (v[1]*255))& 0xff, ((int) (v[2]*255))& 0xff, ((int) (v[3]*255))& 0xff};
}
//testing until failure, and some specific hard-cases separately
public static void main(String[] args) {
//for(float v : new float[]{-2097151.0f}){ //small error here
for(float v : new float[]{3.4028233e+37f, 8191.9844f, 1.0f, 0.0f, 0.5f, 1.0f/3, 0.1234567890f, 2.1234567890f, -0.1234567890f, 1234.567f}){
float output = decode32(load(store(encode32(v))));
PrintStream stream = (v==output) ? System.out : System.err;
stream.println(v + " ?= " + output);
}
//System.exit(0);
Random r = new Random();
float max = 3200000f;
float min = -max;
boolean error = false;
int trials = 0;
while(!error){
float fin = min + r.nextFloat() * ((max - min) + 1);
float fout = decode32(load(store(encode32(fin))));
if(trials % 10000 == 0)
System.out.print('.');
if(trials % 1000000 == 0)
System.out.println();
if(fin != fout){
System.out.println();
System.out.println("correct trials = " + trials);
System.out.println(fin + " vs " + fout);
error = true;
}
trials++;
}
}
}
I tried Arjans solution, but it returned invalid values for 0, 1, 2, 4. There was a bug with the packing of the exponent, which i changed so the exp takes one 8bit float and the sign is packed with the mantissa:
//unpack a 32bit float from 4 8bit, [0;1] clamped floats
float unpackFloat4( vec4 _packed)
{
vec4 rgba = 255.0 * _packed;
float sign = step(-128.0, -rgba[1]) * 2.0 - 1.0;
float exponent = rgba[0] - 127.0;
if (abs(exponent + 127.0) < 0.001)
return 0.0;
float mantissa = mod(rgba[1], 128.0) * 65536.0 + rgba[2] * 256.0 + rgba[3] + (0x800000);
return sign * exp2(exponent-23.0) * mantissa ;
}
//pack a 32bit float into 4 8bit, [0;1] clamped floats
vec4 packFloat(float f)
{
float F = abs(f);
if(F == 0.0)
{
return vec4(0,0,0,0);
}
float Sign = step(0.0, -f);
float Exponent = floor( log2(F));
float Mantissa = F/ exp2(Exponent);
//std::cout << " sign: " << Sign << ", exponent: " << Exponent << ", mantissa: " << Mantissa << std::endl;
//denormalized values if all exponent bits are zero
if(Mantissa < 1.0)
Exponent -= 1;
Exponent += 127;
vec4 rgba;
rgba[0] = Exponent;
rgba[1] = 128.0 * Sign + mod(floor(Mantissa * float(128.0)),128.0);
rgba[2] = floor( mod(floor(Mantissa* exp2(float(23.0 - 8.0))), exp2(8.0)));
rgba[3] = floor( exp2(23.0)* mod(Mantissa, exp2(-15.0)));
return (1 / 255.0) * rgba;
}
Since you didn't deign to give us the exact code you used to create and upload your texture, I can only guess at what you're doing.
You seem to be creating a JavaScript array of floating-point numbers. You then create a Uint8Array, passing that array to the constructor.
According to the WebGL spec (or rather, the spec that the WebGL spec refers to when ostensibly specifying this behavior), the conversion from floats to unsigned bytes happens in one of two ways, based on the destination. If the destination is considered "clamped", then it clamps the number to the destination range, namely [0, 255] for your case. If the destination is not considered "clamped", then it is taken modulo 28. The WebGL "specification" is sufficiently poor that it is not entirely clear whether the construction of Uint8Array is considered clamped or not. Whether clamped or taken modulo 28, the decimal point is chopped off and the integer value stored.
However, when you give this data to OpenWebGL, you told WebGL to interpret the bytes as normalized unsigned integer values. This means that the input values on the range [0, 255] will be accessed by users of the texture as [0, 1] floating point values.
So if your input array had the value 183.45, the value in the Uint8Array would be 183. The value in the texture would be 183/255, or 0.718. If your input value was 0.45, the Uint8Array would hold 0, and the texture result would be 0.0.
Now, because you passed the data as GL_RGBA, that means that every 4 unsigned bytes will be taken as a single texel. So every call to texture will fetch those particular four values (at the given texture coordinate, using the given filtering parameters), thus returning a vec4.
It is not clear what you intend to do with this floating-point data, so it is hard to make suggestions as to how best to pass float data to a shader. However, a general solution would be to use the OES_texture_float extension and actually create a texture that stores floating-point data. Of course, if it isn't available, you'll still have to find a way to do what you want.
BTW, Khronos really should be ashamed of themselves for even calling WebGL a specification. It barely specifies anything; it's just a bunch of references to other specifications, which makes finding the effects of anything exceedingly difficult.
You won't be able to just interpret the 4 unsigned bytes as the bits of a float value (which I assume you want) in a shader (at least not in GLES or WebGL, I think). What you can do is not store the float's bit representation in the 4 ubytes, but the bits of the mantissa (or the fixed point representation). For this you need to know the approximate range of the floats (I'll assume [0,1] here for simplicity, otherwise you have to scale differently, of course):
r = clamp(int(2^8 * f), 0, 255);
g = clamp(int(2^16 * f), 0, 255);
b = clamp(int(2^24 * f), 0, 255); //only have 24 bits of precision anyway
Of course you can also work directly with the mantissa bits. And then in the shader you can just reconstruct it that way, using the fact that the components of the vec4 are all in [0,1]:
f = (v.r) + (v.g / 2^8) + (v.b / 2^16);
Although I'm not sure if this will result in the exact same value, the powers of two should help a bit there.

Resources