How to get regular float32 geometry of a mesh compressed using gltfpack - three.js

I used mesh decimation using gltfpack with the command
gltfpack -i input.glb -o output.glb -si 0.01
this reduces my mesh geometry triangles by 99 percent.
Now my output.glb has geometry.position as an interleaved Buffer Attribute of data type unsigned Int 16. I am using ammo.js to make its physics body, which requires geometry as a regular float32 array. But my attempts to convert it by dividing by 2^32 - 1, 2^31 - 1 have failed.
I get the body but positioned with different size and position, and it doesn't align with the threejs rendered model.
Is there a way to a regular float32 array so that I can pass it to the createConvexHullPhysicsShape function of Ammo.js
I've added a sandBox that shows the issue. issue

Steps for quantizing and de-quantizing a normalized array can be found in the KHR_mesh_quantization specification, equivalent to what hardware APIs will do. For int16 that would be:
float = max(int / 32767.0, -1.0)
This does assume that the values are normalized. In gltfpack I think that is an option. You can check attribute.normalized in three.js. If it is false there is no need for conversion, you can just use the values as-is.
Finally, note that if the buffer is interleaved, you will have to de-interleave, as the buffer might contain vertex attributes other than position.
UPDATE: September 9, 2022
The positions this GLB do not actually appear to be normalized, in the WebGL sense. They are large integers, and there's an inverse scaling on the parent nodes that brings them back down into a smaller floating point range. You'll need to account for that when extracting positions from the Mesh to create a convex hull. For example —
import { deinterleaveAttribute } from "three/examples/jsm/utils/BufferGeometryUtils.js";
function getVertexPositions(root) {
const positions = [];
root.traverse((obj) => {
if (obj.type !== "Mesh") return;
let position = obj.geometry.attributes.position;
// de-interleave the array
position = deinterleaveAttribute(position);
// cast from Int16 to Float32
position = new THREE.BufferAttribute(
new Float32Array(position.array),
// apply mesh scaling to the positions array.
return positions;


How can I scale/interpolate an image with indexed values smoothly?

I am wanting to scale grayscale images (input masks, really) with discrete values up smoothly. The values in these images are indexes that represent arbitrary concepts (e.g. "terrain types"; they are usually indices into a table), rather than values on a continuous scale, so they can't be averaged or blended in any way.
Do there exist algorithms that can do this with a more pleasing result than nearest-neighbour, which results in a very blocky, pixelated result? I am looking for something that will at least produce more rounded, more fluid results. The kind of thing that would be ideal would be a whitepaper, or a library (preferably in Java).
I've researched the subject, but I can't find anything. There is plenty about linear or cubic interpolation, etc., but that won't work for indexed values. The only algorithm I ever see mentioned that does not try to average values is nearest-neighbour. But there must be more?
Using colour here for clarity. I do of course understand that the preferred result here is impossible; I'm not asking for something that reconstitutes destroyed information, just hoping for something that will at least guestimate something smoother than the first result.
Scan the destination image and for every corresponding source pixel (non-integer coordinates) check if the colors of the four surrounding pixels are the same. If yes, assign that color.
If not, perform as many bilinear interpolations as there are different colors. For this assign the weight 1 for a given color (each in turn) and 0 for the others, and interpolate the weight. Finally, keep the color with the largest weight.
By analytical geometry, one can show that in bilinear interpolation, the iso-weight curves are arcs of hyperbola. If your magnification is large, you will see them. G1 continuity is not guaranteed. If this is an annoyance, you can work with G1 bicubic interpolation instead.
If this still does not satisfy you, you can try smooth approximating surfaces rather than interpolating ones. But the principle of keeping the color of maximum weight remains.
If there aren't many distinct colors and you want to use ready-made functions, you can work this out as follows:
split the image in several binary images (white for a chosen color, black for background);
magnify all images (to grayscale) using the favorite method;
now implement yourself a function that assigns every pixel the color that has the largest value among the magnified images.
You can also apply a smoothing filter to the binary images before or after magnification.
For the sake of illustration, here is what you would get with two colors at a time (but this easily generalizes).
Color source image:
Smoothing applied to the binary equivalents:
Maximum weight decision:
One thing you could try is to extract a polygon for the boundary of each uniformly-colored region, then upscale and draw the polygon in the output image. You won’t create neatly rounded edges, but you will avoid the stair-case effect of the nearest neighbor interpolation. Upscaling polygons should avoid gaps between the regions too.
I guess that smoothing the shape for each value individually is a way to avoid undesired mixed value.
To handle values individually, here, I started with your nearest-neighbour image v, and create 3 image { A.bmp, B.bmp, C.bmp } by hand.
(each image has only 1 color region and background is black. e.g. A.bmp is below:)
After smoothing the shape for each image, draw these shapes to one result image buffer with different color.
//I use C++ and OpenCV
int main()
const std::string FileNames[3] = { "A.bmp", "B.bmp", "C.bmp" };
const cv::Scalar ResultShowColor[3] = { cv::Scalar(0,255,255), cv::Scalar(0,255,0), cv::Scalar(0,0,255) };
cv::Mat Imgs[3];
const int KernelSize = 15;
for( int i=0; i<3; ++i )
Imgs[i] = cv::imread( FileNames[i], cv::IMREAD_GRAYSCALE );
if( Imgs[i].empty() )return 0;
cv::threshold( Imgs[i], Imgs[i], 32, 255, cv::THRESH_BINARY );
cv::GaussianBlur( Imgs[i], Imgs[i], cv::Size(KernelSize,KernelSize), 0 );
cv::threshold( Imgs[i], Imgs[i], 255*0.5, 255, cv::THRESH_BINARY );
cv::imshow( FileNames[i], Imgs[i] );
cv::Mat ResultImg = cv::Mat::zeros( Imgs[0].size(), CV_8UC3 );
for( int i=0; i<3; ++i )
ResultImg.setTo( ResultShowColor[i], Imgs[i] );
cv::imshow( "ResultImg", ResultImg );
if( cv::waitKey() == 's' ){ cv::imwrite( "ResultImg.png", ResultImg ); }
return 0;
This is result:
Yes, this result is not enough. Gaps exist at the boundaries of shapes.
Therefore some ingenuity is required... but I post this because it might be some hint for you.

THREE.js: convert one world coordinate unit to screen pixels

For example, I have a unit cube. When there is no scaling, I want to get one coordinate unit represented by how many screen pixels.
When I zoom in, then this one coordinate unit is represented by how many screen pixels?
I'm not entirely sure if i understood your question, but to get any distance between any two THREE vectors you do this:
const distanceAB = new THREE.Vector3(1,1,1).sub( new THREE.Vector3(2,2,2) ).length()
Your unit cube would have vertices like (-1,-1,-1) and (1,1,1) among others, (or actually 0.5). Either way you need to obtain these values (easier when using THREE.Geometry than BufferGeometry).
Then you project these vertices
const vertexA = new THREE.Vector3() // set this from cube
const vertexB = new THREE.Vector3()
const screenSpaceVector = new THREE.Vector3().subVectors(vertexA.project(myCamera),vertexB.project(myCamera))
The result is now in something called NDC which is a cube going from -1 to 1. To normalize it:
screenSpaceVector.multiplyScalar(0.5).add(new THREE.Vector3(0.5,0.5,0.5))
Finally to figure out how many pixels this is
screenSpaceVector.x *= renderer.getSize().width
screenSpaceVector.y *= renderer.getSize().height
screenSpaceVector.z = 0
const pixelLength = screenSpaceVector.length()
I think should do the trick

depth peeling invariance in webgl (and threejs)

I'm looking at what i think is the first paper for depth peeling (the simplest algorithm?) and I want to implement it with webgl, using three.js
I think I understand the concept and was able to make several peels, with some logic that looks like this:
render(scene, camera) {
const oldAutoClear = this._renderer.autoClear
this._renderer.autoClear = false
setDepthPeelActive(true) //sets a global injected uniform in a singleton elsewhere, every material in the scene has onBeforeRender injected with additional logic and uniforms
let ping
let pong
for (let i = 0; i < this._numPasses; i++) {
const pingPong = i % 2 === 0
ping = pingPong ? 1 : 0
pong = pingPong ? 0 : 1
const writeRGBA = this._screenRGBA[i]
const writeDepth = this._screenDepth[ping]
setDepthPeelPassNumber(i) //was going to try increasing the polygonOffsetUnits here globally,
if (i > 0) {
//all but first pass write to depth
const readDepth = this._screenDepth[pong]
this._depthMaterial.uniforms.uFirstPass.value = 0
this._depthMaterial.uniforms.uPrevDepthTex.value = readDepth
} else {
//first pass just renders to depth
this._depthMaterial.uniforms.uFirstPass.value = 1
this._depthMaterial.uniforms.uPrevDepthTex.value = null
scene.overrideMaterial = this._depthMaterial
this._renderer.render(scene, camera, writeDepth, true)
scene.overrideMaterial = null
this._renderer.render(scene, camera, writeRGBA, true)
this._quad.material = this._blitMaterial
// this._blitMaterial.uniforms.uTexture.value = this._screenDepth[ping]
this._blitMaterial.uniforms.uTexture.value = this._screenRGBA[
this._renderer.render(this._scene, this._camera)
this._renderer.autoClear = oldAutoClear
I'm using gl_FragCoord.z to do the test, and packing the depth into a 8bit RGBA texture, with a shader that looks like this:
float depth = gl_FragCoord.z;
vec4 pp = packDepthToRGBA( depth );
if( uFirstPass == 0 ){
float prevDepth = unpackRGBAToDepth( texture2D( uPrevDepthTex , vSS));
if( depth <= prevDepth + 0.0001) {
gl_FragColor = pp;
Varying vSS is computed in the vertex shader, after the projection:
vSS.xy = gl_Position.xy * .5 + .5;
The basic idea seems to work and i get peels, but only if i using the fudge factor. It looks like it fails though as the angle gets more obtuse (which is why polygonOffset needs both the factor and units, to account for the slope?).
I didn't understand at all how the invariance is solved. I don't understand how the mentioned extension is being used other than it seems to be overriding the fragment depth, but with what?
I must admit that I'm not sure even which interpolation is being referred to here since every pixel is aligned, i'm just using nearest filtering.
I did see some hints about depth buffer precision, but not really understanding the issue, i wanted to try packing the depth into only three channels and see what happens.
Having such a small fudge factor make it sort of work tells me that likely all these sampled and computed depths do seem to exist in the same space. But this seems to be the same issue as if using gl.EQUAL for depth testing? For shits and giggles i tried to override the depth with the unpacked depth immediately after packing it, but it didn't seem to do anything.
Increasing the polygon offset with each peel seems to have done the trick. I got some fighting though with the lines but i think it's due to the fact that i was already using offset to draw them and i need to include that in the peel offset. I'd still love to understand more about the problem.
The depth buffer stores depths :) Depending on the 'far' and 'near' planes the perspective projection tends to set the depths of the points "stacked" in just a short part of the buffer. It's not linear in z. You can see this on your own setting a different color depending on the depth and render some triangle that takes most of near-far distance.
A shadow map stores depths (distances to light)... calculated after projection. Later, in the second or following pass, you will compare those depths, which are "stacked", which makes some comparisons to fail due to they are very similar values: hazardous variances.
You can user a more fine-grained depth buffer, 24 bits instead of 16 or 8 bits. This may solve part of the problem.
There's another issue: the perspective division or z/w, needed to get normalized device coordinates (NDC). It occurs after vertex shader, so gl_FragDepth = gl_FragCoord.z is affected.
The other approach is to store the depths calculated in some space that doesn't suffer "stacking" nor perspective division. Camera space is one. In other words, you can calculate the depth undoing projection in the vertex shader.
The article you link to is for old fixed-pipeline, without shaders. It shows a NVIDIA extension to deal with these variances.

PIXI.js - Canvas Coordinate to Container Coordinate

I have initiated a PIXI js canvas:
g_App = new PIXI.Application(800, 600, { backgroundColor: 0x1099bb });
Set up a container:
container = new PIXI.Container();
Put a background texture (2000x2000) into the container:
var texture = PIXI.Texture.fromImage('picBottom.png');
var back = new PIXI.Sprite(texture);
Set the global:
var g_Container = container;
I do various pivot points and rotations on container and canvas stage element:
// Set the focus point of the container
g_App.stage.x = Math.floor(400);
g_App.stage.y = Math.floor(500); // Note this one is not central
g_Container.pivot.set(1000, 1000);
g_Container.rotation = 1.5; // radians
Now I need to be able to convert a canvas pixel to the pixel on the background texture.
g_Container has an element transform which in turn has several elements localTransform, pivot, position, scale ands skew. Similarly g_App.stage has the same transform element.
In Maths this is simple, you just have vector point and do matix operations on them. Then to go back the other way you just find inverses of those matrices and multiply backwards.
So what do I do here in pixi.js?
How do I convert a pixel on the canvas and see what pixel it is on the background container?
Note: The following is written using the USA convention of using matrices. They have row vectors on the left and multiply them by the matrix on the right. (Us pesky Brits in the UK do the opposite. We have column vectors on the right and multiply it by the matrix on the left. This means UK and USA matrices to do the same job will look slightly different.)
Now I have confused you all, on with the answer.
g_Container.transform.localTransform - this matrix takes the world coords to the scaled/transposed/rotated COORDS
g_App.stage.transform.localTransform - this matrix takes the rotated world coords and outputs screen (or more accurately) html canvas coords
So for example the Container matrix is:
MatContainer = [g_Container.transform.localTransform.a, g_Container.transform.localTransform.b, 0]
[g_Container.transform.localTransform.c, g_Container.transform.localTransform.d, 0]
[g_Container.transform.localTransform.tx, g_Container.transform.localTransform.ty, 1]
and the rotated container matrix to screen is:
MatToScreen = [g_App.stage.transform.localTransform.a, g_App.stage.transform.localTransform.b, 0]
[g_App.stage.transform.localTransform.c, g_App.stage.transform.localTransform.d, 0]
[g_App.stage.transform.localTransform.tx, g_App.stage.transform.localTransform.ty, 1]
So to get from World Coordinates to Screen Coordinates (noting our vector will be a row on the left, so the first operation matrix that acts first on the World coordinates must also be on the left), we would need to multiply the vector by:
MatAll = MatContainer * MatToScreen
So if you have a world coordinate vector vectWorld = [worldX, worldY, 1.0] (I'll explain the 1.0 at the end), then to get to the screen coords you would do the following:
vectScreen = vectWorld * MatAll
So to get screen coords and to get to world coords we first need to calculate the inverse matrix of MatAll, call it invMatAll. (There are loads of places that tell you how to do this, so I will not do it here.)
So if we have screen (canvas) coordinates screenX and screenY, we need to create a vector vectScreen = [screenX, screenY, 1.0] (again I will explain the 1.0 later), then to get to world coordinates worldX and worldY we do:
vectWorld = vectScreen * invMatAll
And that is it.
So what about the 1.0?
In a 2D system you can do rotations, scaling with 2x2 matrices. Unfortunately you cannot do a 2D translations with a 2x2 matrix. Consequently you need 3x3 matrices to fully describe all 2D scaling, rotations and translations. This means you need to make your vector 3D as well, and you need to put a 1.0 in the third position in order to do the translations properly. This 1.0 will also be 1.0 after any matrix operation as well.
Note: If we were working in a 3D system we would need 4x4 matrices and put a dummy 1.0 in our 4D vectors for exactly the same reasons.

Detection of coins (and fit ellipses) on an image

I am currently working on a project where I am trying to detect a few coins lying on a flat surface (i.e. a desk). The coins do not overlap and are not hidden by other objects. But there might be other objects visible and the lighting conditions may not be perfect... Basically, consider yourself filming your desk which has some coins on it.
So each point should be visible as an Ellipse. Since I don't know the position of the camera the shape of the ellipses may vary, from a circle (view from top) to flat ellipses depending on the angle the coins are filmed from.
My problem is that I am not sure how to extract the coins and finally fit ellipses over them (which I am looking for to do further calculations).
For now, I have just made the first attempt by setting a threshold value in OpenCV, using findContours() to get the contour lines and fitting an ellipse. Unfortunately, the contour lines only rarely give me the shape of the coins (reflections, bad lighting, ...) and this way is also not preferred since I don't want the user to set any threshold.
Another idea was to use a template matching method of an ellipse on that image, but since I don't know the angle of the camera nor the size of the ellipses I don't think this would work well...
So I wanted to ask if anybody could tell me a method that would work in my case.
Is there a fast way to extract the three coins from the image? The calculations should be made in realtime on mobile devices and the method should not be too sensitive for different or changing lights or the color of the background.
Would be great if anybody could give me any tips on which method could work for me.
Here's some C99 source implementing the traditional approach (based on OpenCV doco):
#include "cv.h"
#include "highgui.h"
#include <stdio.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
// We need this to be high enough to get rid of things that are too small too
// have a definite shape. Otherwise, they will end up as ellipse false positives.
#define MIN_AREA 100.00
// One way to tell if an object is an ellipse is to look at the relationship
// of its area to its dimensions. If its actual occupied area can be estimated
// using the well-known area formula Area = PI*A*B, then it has a good chance of
// being an ellipse.
// This value is the maximum permissible error between actual and estimated area.
#define MAX_TOL 100.00
int main( int argc, char** argv )
IplImage* src;
// the first command line parameter must be file name of binary (black-n-white) image
if( argc == 2 && (src=cvLoadImage(argv[1], 0))!= 0)
IplImage* dst = cvCreateImage( cvGetSize(src), 8, 3 );
CvMemStorage* storage = cvCreateMemStorage(0);
CvSeq* contour = 0;
cvThreshold( src, src, 1, 255, CV_THRESH_BINARY );
// Invert the image such that white is foreground, black is background.
// Dilate to get rid of noise.
cvXorS(src, cvScalar(255, 0, 0, 0), src, NULL);
cvDilate(src, src, NULL, 2);
cvFindContours( src, storage, &contour, sizeof(CvContour), CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE, cvPoint(0,0));
cvZero( dst );
for( ; contour != 0; contour = contour->h_next )
double actual_area = fabs(cvContourArea(contour, CV_WHOLE_SEQ, 0));
if (actual_area < MIN_AREA)
// Assuming the axes of the ellipse are vertical/perpendicular.
CvRect rect = ((CvContour *)contour)->rect;
int A = rect.width / 2;
int B = rect.height / 2;
double estimated_area = M_PI * A * B;
double error = fabs(actual_area - estimated_area);
if (error > MAX_TOL)
"center x: %d y: %d A: %d B: %d\n",
rect.x + A,
rect.y + B,
CvScalar color = CV_RGB( rand() % 255, rand() % 255, rand() % 255 );
cvDrawContours( dst, contour, color, color, -1, CV_FILLED, 8, cvPoint(0,0));
cvSaveImage("coins.png", dst, 0);
Given the binary image that Carnieri provided, this is the output:
./opencv-contour.out coin-ohtsu.pbm
center x: 291 y: 328 A: 54 B: 42
center x: 286 y: 225 A: 46 B: 32
center x: 471 y: 221 A: 48 B: 33
center x: 140 y: 210 A: 42 B: 28
center x: 419 y: 116 A: 32 B: 19
And this is the output image:
What you could improve on:
Handle different ellipse orientations (currently, I assume the axes are perpendicular/horizontal). This would not be hard to do using image moments.
Check for object convexity (have a look at cvConvexityDefects)
Your best way of distinguishing coins from other objects is probably going to be by shape. I can't think of any other low-level image features (color is obviously out). So, I can think of two approaches:
Traditional object detection
Your first task is to separate the objects (coins and non-coins) from the background. Ohtsu's method, as suggested by Carnieri, will work well here. You seem to worry about the images being bipartite but I don't think this will be a problem. As long as there is a significant amount of desk visible, you're guaranteed to have one peak in your histogram. And as long as there are a couple of visually distinguishable objects on the desk, you are guaranteed your second peak.
Dilate your binary image a couple of times to get rid of noise left by thresholding. The coins are relatively big so they should survive this morphological operation.
Group the white pixels into objects using region growing -- just iteratively connect adjacent foreground pixels. At the end of this operation you will have a list of disjoint objects, and you will know which pixels each object occupies.
From this information, you will know the width and the height of the object (from the previous step). So, now you can estimate the size of the ellipse that would surround the object, and then see how well this particular object matches the ellipse. It may be easier just to use width vs height ratio.
Alternatively, you can then use moments to determine the shape of the object in a more precise way.
I don't know what the best method for your problem is. About thresholding specifically, however, you can use Otsu's method, which automatically finds the optimal threshold value based on an analysis of the image histogram. Use OpenCV's threshold method with the parameter ThresholdType equal to THRESH_OTSU.
Be aware, though, that Otsu's method work well only in images with bimodal histograms (for instance, images with bright objects on a dark background).
You've probably seen this, but there is also a method for fitting an ellipse around a set of 2D points (for instance, a connected component).
EDIT: Otsu's method applied to a sample image:
Grayscale image:
Result of applying Otsu's method:
If anyone else comes along with this problem in the future as I did, but using C++:
Once you have used findContours to find the contours (as in Misha's answer above), you can easily fit ellipses using fitEllipse, eg
vector<vector<Point> > contours;
findContours(img, contours, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, Point(0,0));
RotatedRect rotRecs[contours.size()];
for (int i = 0; i < contours.size(); i++) {
rotRecs[i] = fitEllipse(contours[i]);
