Data structures and algorithms for adaptive "uniform" mesh? - computational-geometry

I need a data structure for storing float values at an uniformly sampled 3D mesh:
x = x0 + ix*dx where 0 <= ix < nx
y = y0 + iy*dy where 0 <= iy < ny
z = z0 + iz*dz where 0 <= iz < nz
Up to now I have used my Array class:
Array3D<float> A(nx, ny,nz);
A(0,0,0) = 0.0f; // ix = iy = iz = 0
Internally it stores the float values as an 1D array with nx * ny * nz elements.
However now I need to represent an mesh with more values than I have RAM,
e.g. nx = ny = nz = 2000.
I think many neighbour nodes in such an mesh may have similar values so I was thinking if there was some simple way that I could "coarsen" the mesh adaptively.
For instance if the 8 (ix,iy,iz) nodes of an cell in this mesh have values that are less than 5% apart; they are "removed" and replaced by just one value; the mean of the 8 values.
How could I implement such a data structure in a simple and efficient way?
EDIT:
thanks Ante for suggesting lossy compression. I think this could work the following way:
#define BLOCK_SIZE 64
struct CompressedArray3D {
CompressedArray3D(int ni, int nj, int nk) {
NI = ni/BLOCK_SIZE + 1;
NJ = nj/BLOCK_SIZE + 1;
NK = nk/BLOCK_SIZE + 1;
blocks = new float*[NI*NJ*NK];
compressedSize = new unsigned int[NI*NJ*NK];
}
void setBlock(int I, int J, int K, float values[BLOCK_SIZE][BLOCK_SIZE][BLOCK_SIZE]) {
unsigned int csize;
blocks[I*NJ*NK + J*NK + K] = compress(values, csize);
compressedSize[I*NJ*NK + J*NK + K] = csize;
}
float getValue(int i, int j, int k) {
int I = i/BLOCK_SIZE;
int J = j/BLOCK_SIZE;
int K = k/BLOCK_SIZE;
int ii = i - I*BLOCK_SIZE;
int jj = j - J*BLOCK_SIZE;
int kk = k - K*BLOCK_SIZE;
float *compressedBlock = blocks[I*NJ*NK + J*NK + K];
unsigned int csize = compressedSize[I*NJ*NK + J*NK + K];
float values[BLOCK_SIZE][BLOCK_SIZE][BLOCK_SIZE];
decompress(compressedBlock, csize, values);
return values[ii][jj][kk];
}
// number of blocks:
int NI, NJ, NK;
// number of samples:
int ni, nj, nk;
float** blocks;
unsigned int* compressedSize;
};
For this to be useful I need a lossy compression that is:
extremely fast, also on small datasets (e.g. 64x64x64)
compress quite hard > 3x, never mind if it looses quite a bit of info.
Any good candidates?

It sounds like you're looking for a LOD (level of detail) adaptive mesh. It's a recurring theme in video games and terrain simulation.
For terrain, see here: http://vterrain.org/LOD/Papers/ -- look for the ROAM video which is IIRC not only adaptive by distance, but also by view direction.
For non-terrain entities, there is a huge body of work (here's one example: Generic Adaptive Mesh Refinement).

I would suggest to use OctoMap to handle large 3D data.
And to extend it as shown here to handle geometrical properties.

Related

copy an one eigen matrix of vectors

I have
A(matrix of vectors with length = depth) is 5x5 (5 rows and 5 cols).
depth = 3 (it is the length of vector of any cell of matrix A).
B(matrix of single values) is 75 x Any (5*5*3 rows and Any cols).
x_size_kernel = 5.
block_idx is the index, here for example we have made it equal 0 (for one column of matrix B only)
The task of this simple and strict example is to copy all vectors of matrix A to one (first column) of matrix B.
Now I solve the problem like this (it is concrete example with precise data)
Eigen::MatrixXf B;
B = Eigen::MatrixXf(x_size_kernel * y_size_kernel * depth, 100).setZero();
Eigen::Matrix<Eigen::VectorXf, Eigen::Dynamic, Eigen::Dynamic> A;
A.resize(5, 5);
auto depth = 3;
for (auto yy = 0; yy < A.rows(); yy++) {
for (auto xx = 0; xx < A.cols(); xx++) {
A(yy, xx).resize(depth);
}
}
auto block_idx = 0;
// and here are all copy for one column of matrix B
for (auto my = 0; my < x_size_kernel; my++) {
for (auto mx = 0; mx < x_size_kernel; mx++) {
// add the next column of block data
B.col(block_idx).
segment(mx * depth + my * x_size_kernel * depth, depth).noalias() =
A(my, mx);
}
}
But the above code is very slow, so I need more fast code. Maybe somebody know how to copy data in such way using only Eigen one pass.
Thank you for helping.

Meaning of Parameters in BackgroundSubtractorGMG Algorithm in OpenCV 3.0

I am studying the GMG background subtraction algorithm as described in this paper. As OpenCV 3.0 also has an implementation of the GMG algorithm (as the additional package opencv_contrib), I try to study the two together. However, I am not quite sure about the meaning of the two parameters maxFeatures and quantizationLevels, as I want to map them to the descriptions in the paper.
Quoting the source code in OpenCV 3.0 (file modules\bgsegm\src\bgfg_gmg.cpp):
//! Total number of distinct colors to maintain in histogram.
int maxFeatures;
and
//! Number of discrete levels in each channel to be used in histograms.
int quantizationLevels;
And quoting the paper (Section II B) (with some mathematical symbol and variable names modified since LaTex is not supported here):
"... Of the T observed features, select the F_tot <= F_max most recently observed unique features; let I is a subset of {1, 2, ... T}, where |I| = F_tot, be the corresponding time index set. (If T > F_max, it is possible that F_tot, the number of distinct features observed, exceeds the limit F_max. In that case, we throw away the oldest observations so that F_tot <= F_max). Then, we calculate an average to generate the initial histogram: H(T) = (1/F_tot)∑f(r). This puts equal weight, 1/F_tot, in F_tot unique bins of the histogram."
From the above description, I was originally convinced that maxFeatures in OpenCV 3.0 refers to F_max in the paper, and quantizationLevels refers to F_tot. However, this does not sound right for two reasons: (1) The paper mentions "F_tot is the number of distinct features observed", and (2) the OpenCV source code does not pose any relationship between maxFeatures and quantizationLevels, while the paper clearly suggests that the former should be larger than or equal to the latter.
So, what are the meaning of maxFeatures and quantizationLevels? And is quantizationLevels a parameter introduced by OpenCV for the calculation of histogram?
After further studying the source code in OpenCV, I believe that maxFeatures refer to F_max in the paper, while quantizationLevels is actually the number of bins in the histogram. The reason is as follows:
In the function insertFeature(), which contains the following code:
static bool insertFeature(unsigned int color, float weight, unsigned int* colors, float* weights, int& nfeatures, int maxFeatures)
{
int idx = -1;
for (int i = 0; i < nfeatures; ++i) {
if (color == colors[i]) {
// feature in histogram
weight += weights[i];
idx = i;
break;
}
}
if (idx >= 0) { // case 1
// move feature to beginning of list
::memmove(colors + 1, colors, idx * sizeof(unsigned int));
::memmove(weights + 1, weights, idx * sizeof(float));
colors[0] = color;
weights[0] = weight;
}
else if (nfeatures == maxFeatures) { // case 2
// discard oldest feature
::memmove(colors + 1, colors, (nfeatures - 1) * sizeof(unsigned int));
::memmove(weights + 1, weights, (nfeatures - 1) * sizeof(float));
colors[0] = color;
weights[0] = weight;
}
else { // case 3
colors[nfeatures] = color;
weights[nfeatures] = weight;
++nfeatures;
return true;
}
return false;
}
Case 1: When the color matches one item in the array colors[], the corresponding array element is obtained.
Case 2: When the color does not match any item in the array colors[] and maxFeatures is reached (nFeatures stores the number of items stored in the array), then the oldest feature is removed for the new color.
Case 3: When the color does not match any item in the array colors[] and maxFeatures is not reached yet, add the color to a new item of the array, and nFeatures is incremented by 1.
Hence maxFeatures should correspond to F_max (the maximum number of features) in the paper.
In addition, in the function apply():
static unsigned int apply(const void* src_, int x, int cn, double minVal, double maxVal, int quantizationLevels)
{
const T* src = static_cast<const T*>(src_);
src += x * cn;
unsigned int res = 0;
for (int i = 0, shift = 0; i < cn; ++i, ++src, shift += 8)
res |= static_cast<int>((*src - minVal) * quantizationLevels / (maxVal - minVal)) << shift;
return res;
}
This function maps the pixel's color intensity pointed to by the pointer src_ to a bin according to the values of maxVal, minVal and quantizationLevels, such that if quantizationLevels = q, the result of the code:
static_cast<int>((*src - minVal) * quantizationLevels / (maxVal - minVal))
must be an integer in the range [0, q-1]. However, there can be cn channels (for example, in RGB, cn = 3, hence the shift operation and the possible number of bins (denote it as b) is therefore quantizationLevels to the power of cn. So if b > F_max, we have to discard the (b - F_max) oldest features.
Hence, in OpenCV, if we set maxFeatures to be >= quantizationLevels ^ cn, we never have to discard the oldest features, because we allow more than enough bins, or more than enough different unique features.

Is this part of a real IFFT process really optimal?

When calculating (I)FFT it is possible to calculate "N*2 real" data points using a ordinary complex (I)FFT of N data points.
Not sure about my terminology here, but this is how I've read it described.
There are several posts about this on stackoverflow already.
This can speed things up a bit when only dealing with such "real" data which is often the case when dealing with for example sound (re-)synthesis.
This increase in speed is offset by the need for a pre-processing step that somehow... uhh... fidaddles? the data to achieve this. Look I'm not even going to try to convince anyone I fully understand this but thanks to previously mentioned threads, I came up with the following routine, which does the job nicely (thank you!).
However, on my microcontroller this costs a bit more than I'd like even though trigonometric functions are already optimized with LUTs.
But the routine itself just looks like it should be possible to optimize mathematically to minimize processing. To me it seems similar to plain 2d rotation. I just can't quite wrap my head around it, but it just feels like this could be done with fewer both trigonometric calls and arithmetic operations.
I was hoping perhaps someone else might easily see what I don't and provide some insight into how this math may be simplified.
This particular routine is for use with IFFT, before the bit-reversal stage.
pseudo-version:
INPUT
MAG_A/B = 0 TO 1
PHA_A/B = 0 TO 2PI
INDEX = 0 TO PI/2
r = MAG_A * sin(PHA_A)
i = MAG_B * sin(PHA_B)
rsum = r + i
rdif = r - i
r = MAG_A * cos(PHA_A)
i = MAG_B * cos(PHA_B)
isum = r + i
idif = r - i
r = -cos(INDEX)
i = -sin(INDEX)
rtmp = r * isum + i * rdif
itmp = i * isum - r * rdif
OUTPUT rsum + rtmp
OUTPUT itmp + idif
OUTPUT rsum - rtmp
OUTPUT itmp - idif
original working code, if that's your poison:
void fft_nz_set(fft_complex_t complex[], unsigned bits, unsigned index, int32_t mag_lo, int32_t pha_lo, int32_t mag_hi, int32_t pha_hi) {
unsigned size = 1 << bits;
unsigned shift = SINE_TABLE_BITS - (bits - 1);
unsigned n = index; // index for mag_lo, pha_lo
unsigned z = size - index; // index for mag_hi, pha_hi
int32_t rsum, rdif, isum, idif, r, i;
r = smmulr(mag_lo, sine(pha_lo)); // mag_lo * sin(pha_lo)
i = smmulr(mag_hi, sine(pha_hi)); // mag_hi * sin(pha_hi)
rsum = r + i; rdif = r - i;
r = smmulr(mag_lo, cosine(pha_lo)); // mag_lo * cos(pha_lo)
i = smmulr(mag_hi, cosine(pha_hi)); // mag_hi * cos(pha_hi)
isum = r + i; idif = r - i;
r = -sinetable[(1 << SINE_BITS) - (index << shift)]; // cos(pi_c * (index / size) / 2)
i = -sinetable[index << shift]; // sin(pi_c * (index / size) / 2)
int32_t rtmp = smmlar(r, isum, smmulr(i, rdif)) << 1; // r * isum + i * rdif
int32_t itmp = smmlsr(i, isum, smmulr(r, rdif)) << 1; // i * isum - r * rdif
complex[n].r = rsum + rtmp;
complex[n].i = itmp + idif;
complex[z].r = rsum - rtmp;
complex[z].i = itmp - idif;
}
// For reference, this would be used as follows to generate a sawtooth (after IFFT)
void synth_sawtooth(fft_complex_t *complex, unsigned fft_bits) {
unsigned fft_size = 1 << fft_bits;
fft_sym_dc(complex, 0, 0); // sets dc bin [0]
for(unsigned n = 1, z = fft_size - 1; n <= fft_size >> 1; n++, z--) {
// calculation of amplitude/index (sawtooth) for both n and z
fft_sym_magnitude(complex, fft_bits, n, 0x4000000 / n, 0x4000000 / z);
}
}

PyCUDA - passing a matrix by reference from python to C++ CUDA code

I have to write in a PyCUDA function that gets two matrices Nx3 and Mx3, and return a matrix NxM, but I can't figure out how to pass by reference a matrix without knowing the number of columns.
My code basically is something like that:
#kernel declaration
mod = SourceModule("""
__global__ void distance(int N, int M, float d1[][3], float d2[][3], float res[][M])
{
int i = threadIdx.x;
int j = threadIdx.y;
float x, y, z;
x = d2[j][0]-d1[i][0];
y = d2[j][1]-d1[i][1];
z = d2[j][2]-d1[i][2];
res[i][j] = x*x + y*y + z*z;
}
""")
#load data
data1 = numpy.loadtxt("data1.txt").astype(numpy.float32) # Nx3 matrix
data2 = numpy.loadtxt("data2.txt").astype(numpy.float32) # Mx3 matrix
N=data1.shape[0]
M=data2.shape[0]
res = numpy.zeros([N,M]).astype(numpy.float32) # NxM matrix
#invoke kernel
dist_gpu = mod.get_function("distance")
dist_gpu(cuda.In(numpy.int32(N)), cuda.In(numpy.int32(M)), cuda.In(data1), cuda.In(data2), cuda.Out(res), block=(N,M,1))
#save data
numpy.savetxt("results.txt", res)
Compiling this I receive an error:
kernel.cu(3): error: a parameter is not allowed
that is, I cannot use M as the number of columns for res[][] in the declaretion of the function. I cannot either left the number of columns undeclared...
I need a matrix NxM as an output, but I can't figure out how to do this. Can you help me?
You should use pitched linear memory access inside the kernel, that is how ndarray and gpuarray store data internally, and PyCUDA will pass a pointer to the data in gpu memory allocated for a gpuarray when it is supplied as a argument to a PyCUDA kernel. So (if I understand what you are trying to do) your kernel should be written as something like:
__device__ unsigned int idx2d(int i, int j, int lda)
{
return j + i*lda;
}
__global__ void distance(int N, int M, float *d1, float *d2, float *res)
{
int i = threadIdx.x + blockDim.x * blockIdx.x;
int j = threadIdx.y + blockDim.y * blockIdx.y;
float x, y, z;
x = d2[idx2d(j,0,3)]-d1[idx2d(i,0,3)];
y = d2[idx2d(j,1,3)]-d1[idx2d(i,1,3)];
z = d2[idx2d(j,2,3)]-d1[idx2d(i,2,3)];
res[idx2d(i,j,N)] = x*x + y*y + z*z;
}
Here I have assumed the numpy default row major ordering in defining the idx2d helper function. There are still problems with the Python side of the code you posted, but I guess you know that already.
EDIT: Here is a complete working repro case based of the code posted in your question. Note that it only uses a single block (like the original), so be mindful of block and grid dimensions when trying to run it on anything other than trivially small cases.
import numpy as np
from pycuda import compiler, driver
from pycuda import autoinit
#kernel declaration
mod = compiler.SourceModule("""
__device__ unsigned int idx2d(int i, int j, int lda)
{
return j + i*lda;
}
__global__ void distance(int N, int M, float *d1, float *d2, float *res)
{
int i = threadIdx.x + blockDim.x * blockIdx.x;
int j = threadIdx.y + blockDim.y * blockIdx.y;
float x, y, z;
x = d2[idx2d(j,0,3)]-d1[idx2d(i,0,3)];
y = d2[idx2d(j,1,3)]-d1[idx2d(i,1,3)];
z = d2[idx2d(j,2,3)]-d1[idx2d(i,2,3)];
res[idx2d(i,j,N)] = x*x + y*y + z*z;
}
""")
#make data
data1 = np.random.uniform(size=18).astype(np.float32).reshape(-1,3)
data2 = np.random.uniform(size=12).astype(np.float32).reshape(-1,3)
N=data1.shape[0]
M=data2.shape[0]
res = np.zeros([N,M]).astype(np.float32) # NxM matrix
#invoke kernel
dist_gpu = mod.get_function("distance")
dist_gpu(np.int32(N), np.int32(M), driver.In(data1), driver.In(data2), \
driver.Out(res), block=(N,M,1), grid=(1,1))
print res

Least Squares solution to simultaneous equations

I am trying to fit a transformation from one set of coordinates to another.
x' = R + Px + Qy
y' = S - Qx + Py
Where P,Q,R,S are constants, P = scale*cos(rotation). Q=scale*sin(rotation)
There is a well known 'by hand' formula for fitting P,Q,R,S to a set of corresponding points.
But I need to have an error estimate on the fit - so I need a least squares solution.
Read 'Numerical Recipes' but I'm having trouble working out how to do this for data sets with x and y in them.
Can anyone point to an example/tutorial/code sample of how to do this ?
Not too bothered about the language.
But - just use built in feature of Matlab/Lapack/numpy/R probably not helpful !
edit:
I have a large set of old(x,y) new(x,y) to fit to. The problem is overdetermined (more data points than unknowns) so simple matrix inversion isn't enough - and as I said I really need the error on the fit.
The following code should do the trick. I used the following formula for the residuals:
residual[i] = (computed_x[i] - actual_x[i])^2
+ (computed_y[i] - actual_y[i])^2
And then derived the least-squares formulae based on the general procedure described at Wolfram's MathWorld.
I tested out this algorithm in Excel and it performs as expected. I Used a collection of ten random points which were then rotated, translated and scaled by a randomly generated transformation matrix.
With no random noise applied to the output data, this program produces four parameters (P, Q, R, and S) which are identical to the input parameters, and an rSquared value of zero.
As more and more random noise is applied to the output points, the constants start to drift away from the correct values, and the rSquared value increases accordingly.
Here is the code:
// test data
const int N = 1000;
float oldPoints_x[N] = { ... };
float oldPoints_y[N] = { ... };
float newPoints_x[N] = { ... };
float newPoints_y[N] = { ... };
// compute various sums and sums of products
// across the entire set of test data
float Ex = Sum(oldPoints_x, N);
float Ey = Sum(oldPoints_y, N);
float Exn = Sum(newPoints_x, N);
float Eyn = Sum(newPoints_y, N);
float Ex2 = SumProduct(oldPoints_x, oldPoints_x, N);
float Ey2 = SumProduct(oldPoints_y, oldPoints_y, N);
float Exxn = SumProduct(oldPoints_x, newPoints_x, N);
float Exyn = SumProduct(oldPoints_x, newPoints_y, N);
float Eyxn = SumProduct(oldPoints_y, newPoints_x, N);
float Eyyn = SumProduct(oldPoints_y, newPoints_y, N);
// compute the transformation constants
// using least-squares regression
float divisor = Ex*Ex + Ey*Ey - N*(Ex2 + Ey2);
float P = (Exn*Ex + Eyn*Ey - N*(Exxn + Eyyn))/divisor;
float Q = (Exn*Ey + Eyn*Ex + N*(Exyn - Eyxn))/divisor;
float R = (Exn - P*Ex - Q*Ey)/N;
float S = (Eyn - P*Ey + Q*Ex)/N;
// compute the rSquared error value
// low values represent a good fit
float rSquared = 0;
float x;
float y;
for (int i = 0; i < N; i++)
{
x = R + P*oldPoints_x[i] + Q*oldPoints_y[i];
y = S - Q*oldPoints_x[i] + P*oldPoints_y[i];
rSquared += (x - newPoints_x[i])^2;
rSquared += (y - newPoints_y[i])^2;
}
To find P, Q, R, and S, then you can use least squares. I think the confusing thing is that that usual description of least squares uses x and y, but they don't match the x and y in your problem. You just need translate your problem carefully into the least squares framework. In your case the independent variables are the untransformed coordinates x and y, the dependent variables are the transformed coordinates x' and y', and the adjustable parameters are P, Q, R, and S. (If this isn't clear enough, let me know and I'll post more detail.)
Once you've found P, Q, R, and S, then scale = sqrt(P^2 + Q^2) and you can then find the rotation from sin rotation = Q / scale and cos rotation = P / scale.
You can use the levmar program to calculate this. Its tested and integrated into multiple products including mine. Its licensed under the GPL, but if this is a non-opensource project, he will change the license for you (for a fee)
Define the 3x3 matrix T(P,Q,R,S) such that (x',y',1) = T (x,y,1). Then compute
A = \sum_i |(T (x_i,y_i,1)) - (x'_i,y'_i,1)|^2
and minimize A against (P,Q,R,S).
Coding this yourself is a medium to large sized project unless you can guarntee that the data are well conditioned, especially when you want good error estimates out of the procedure. You're probably best off using an existing minimizer that supports error estimates..
Particle physics type would use minuit either directly from CERNLIB (with the coding most easily done in fortran77), or from ROOT (with the coding in c++, or it should be accessible though the python bindings). But that is a big installation if you don't have one of these tools already.
I'm sure that others can suggest other minimizers.
Thanks eJames, thats almost exaclty what I have. I coded it from an old army surveying manual that was based on an earlier "Instructions to Surveyors" note that must be 100years old! (It uses N and E for North and East rather than x/y )
The goodness of fit parameter will be very useful - I can interactively throw out selected points if they make the fit worse.
FindTransformation(vector<Point2D> known,vector<Point2D> unknown) {
{
// sums
for (unsigned int ii=0;ii<known.size();ii++) {
sum_e += unknown[ii].x;
sum_n += unknown[ii].y;
sum_E += known[ii].x;
sum_N += known[ii].y;
++n;
}
// mean position
me = sum_e/(double)n;
mn = sum_n/(double)n;
mE = sum_E/(double)n;
mN = sum_N/(double)n;
// differences
for (unsigned int ii=0;ii<known.size();ii++) {
de = unknown[ii].x - me;
dn = unknown[ii].y - mn;
// for P
sum_deE += (de*known[ii].x);
sum_dnN += (dn*known[ii].y);
sum_dee += (de*unknown[ii].x);
sum_dnn += (dn*unknown[ii].y);
// for Q
sum_dnE += (dn*known[ii].x);
sum_deN += (de*known[ii].y);
}
double P = (sum_deE + sum_dnN) / (sum_dee + sum_dnn);
double Q = (sum_dnE - sum_deN) / (sum_dee + sum_dnn);
double R = mE - (P*me) - (Q*mn);
double S = mN + (Q*me) - (P*mn);
}
One issue is that numeric stuff like this is often tricky. Even when the algorithms are straightforward, there's often problems that show up in actual computation.
For that reason, if there is a system you can get easily that has a built-in feature, it might be best to use that.

Resources