I'm writing CG shaders for advanced lighting calculation for game based on Unity. Sometimes it is needed to sum all vector components. There are two ways to do it:
Just write something like:
float sum = v.x + v.y + v.z;
Or do something like:
float sum = dot(v,float3(1,1,1));
I am really curious about what is faster and looks better for code style.
It's obvious that if we have same question for CPU calculations, the first simle way is much better. Because of:
a) There is no need to allocate another float(1,1,1) vector
b) There is no need to multiply every original vector "v" components by 1.
But since we do it in shader code, which runs on GPU, I belive there is some great hardware optimization for dot product function, and may be allocation of float3(1,1,1) will be translated in no allocation at all.
float4 _someVector;
void surf (Input IN, inout SurfaceOutputStandard o){
float sum = _someVector.x + _someVector.y + _someVector.z + _someVector.w;
// VS
float sum2 = dot(_someVector, float4(1,1,1,1));
}
Check this link.
Vec3 Dot has a cost of 3 cycles, while Scalar Add has a cost of 1.
Thus, in almost all platforms (AMD and NVIDIA):
float sum = v.x + v.y + v.z; has a cost of 2
float sum = dot(v,float3(1,1,1)); has a cost of 3
The first implementation should be faster.
Implementation of the Dot product in cg: https://developer.download.nvidia.com/cg/dot.html
IMHO difference is immeasurable, in 98% of the cases, but first one should be faster, because multiplication is a "more expensive" operation
I have in mind the following experiment to run in Matlab and I am asking for an help to implement step (3). Any suggestion would be very appreciated.
(1) Consider the random variables X and Y both uniformly distributed on [0,1]
(2) Draw N realisation from the joint distribution of X and Y assuming that X and Y are independent (meaning that X and Y are uniformly jointly distributed on [0,1]x[0,1]). Each draw will be in [0,1]x[0,1].
(3) Transform each draw in [0,1]x[0,1] in a draw in [0,1] using the Hilbert space filling curve: under the Hilbert curve mapping, the draw in [0,1]x[0,1] should be the image of one (or more because of surjectivity) point(s) in [0,1]. I want pick one of these points. Is there any pre-built package in Matlab doing this?
I found this answer which I don't think does what I want as it explains how to obtain the Hilbert value of the draw (curve length from the start of curve to the picked point)
On wikipedia I found this code in C language (from (x,y) to d) which, again, does not fulfil my question.
EDIT This answer does not address updated version of the question, which explicitly asks about constructing Hilbert curve. Instead, this answer addresses a related question on construction of bijective mapping, and the relation to uniform distribution.
Your problem in not really well defined. If you only need the resulting distribution to be uniform, nothing is stopping you from simply picking f:(X,Y)->X. Result would be uniform regardless of whether X and Y are correlated. From your post I can only presume that what you want, in fact, is for the resulting transformation to be bijective, or as close to it as possible given machine precision limitations.
Worth noting that unless you need the algorithm that is best in preserving locality (which is clearly not required for resulting distribution to be bijective, not to mention uniform), there's no need to bother constructing Hilbert curves that you mention in your question. They have just as much to do with the solution as any other space-filling curve, and are incredibly computationally intensive.
So assuming you're looking for a bijective mapping, your question is equivalent to asking whether the set of points in a [unit] square has the same cardinality as the set of points in a [unit] line segment, and if it is, how to construct that bijection, i.e. 1-to-1 correspondence. The intuition says the square should have a higher cardinality, and Cantor spent 3 years trying to prove that, eventually proving quite the opposite - that these sets are in fact equinumerous. He was so surprised at his discovery that he wrote:
I see it, but I don't believe it!
The most commonly referred to bijection, fulfilling** this criteria, is the following. Represent x and y in their decimal form, i.e. x=0. x1 x2 x3 x4 x5..., and y=0. y1 y2 y3 y4 y5..., and let f:(X,Y)->Z be z=0. x1 y1 x2 y2 x3 y3 x4 y4 x5 y5..., i.e. alternating the decimals of the two numbers. The idea behind the bijection is trivial, though a rigorous proof requires quite a bit of prior knowledge.
** The caveat is that if we take e.g. x = 1/3 = 0.33333... and y = 1/5 = 0.199999... = 0.200000..., we can see there are two sequences corresponding to them: z = 0.313939393939... and z = 0.323030303030.... To overcome this obstacle we have to prove that adding a countable set to an uncountable one does not change the cardinality of the latter.
In reality we have to deal with machine precision and not pure math, which strictly speaking means both sets are actually finite and hence not equinumerous (assuming you store result with the same precision as original numbers). Which means we're simply forced to do some assumptions and loose some information, such as, in this case, the last half of significant digits of x and y. That is, unless we use a different data type that allows to store result with a double precision, compared to that of original variables.
Finally, sample implementation in Matlab:
x = rand();
y = rand();
chars = [num2str(x, '%.17f'); num2str(y, '%.17f')];
z = str2double(['0.' reshape(chars(:,3:end), 1, [])]);
>> cellstr(['x=' num2str(x, '%.17f'); 'y=' num2str(y, '%.17f'); 'z=' num2str(z, '%.17f')])
ans =
'x=0.65549803980353738'
'y=0.10975505072305158'
'z=0.61505947958500362'
Edit This answers the original request for a transformation f(x,y) -> t ~ U[0,1] given x,y ~ U[0,1], and additionally for x and y correlated. The updated question asks specifically for a Hilbert curve, H(x,y) -> t ~ U[0,1] and only for x,y ~ U[0,1] so this answer is no longer relevant.
Consider a random uniform sequence in [0,1] r1, r2, r3, .... You are assigning this sequence to pairs of numbers (x1,y1), (x2,y2), .... What you are asking for is a transformation on pairs (x,y) which yield a uniform random number in [0,1].
Consider the random subsequence r1, r3, ... corresponding to x1, x2, .... If you trust that your number generator is random and uncorrelated in [0,1], then the subsequence x1, x2, ... should also be random and uncorrelated in [0,1]. So the rather simple answer to the first part of your question is a projection onto either the x or y axis. That is, just pick x.
Next consider correlations between x and y. Since you haven't specified the nature of the correlation, let's assume a simple scaling of the axes,
such as x' => [0, 0.5], y' => [0, 3.0], followed by a rotation. The scaling doesn't introduce any correlation since x' and y' are still independent. You can generate it easily enough with a matrix multiply:
M1*p = [x_scale, 0; 0, y_scale] * [x; y]
for matrix M1 and point p. You can introduce a correlation by taking this stretched form and rotating it by theta:
M2*M1*p = [cos(theta), sin(theta); -sin(theta), cos(theta)]*M1*p
Putting it all together with theta = pi/4, or 45 degrees, you can see that larger values of y are correlated with larger values of x:
cos_t = sin_t = cos(pi/4); % at 45 degrees, sin(t) = cos(t) = 1/sqrt(2)
M2 = [cos_t, sin_t; -sin_t, cos_t];
M1 = [0.5, 0.0; 0.0, 3.0];
p = random(2,1000);
p_prime = M2*M1*p;
plot(p_prime(1)', p_prime(2)', '.');
axis('equal');
The resulting plot* shows a band of uniformly distributed numbers at a 45 degree angle:
Further transformations are possible with shear, and if you are clever about it, translation (OpenGL uses 4x4 transformation matrices so that translation can be represented as a linear transform matrix, with an extra dimension added before the transformation steps and removed before they are done).
Given a known affine correlation structure, you can transform back from random points (x',y') to points (x,y) where x and y are independent in [0,1] by solving Mk*...*M1 p = p_prime for p, or equivalently, by setting p = inv(Mk*...*M1) * p_prime, where p=[x;y]. Again, just pick x, which will be uniform in [0,1]. This doesn't work if the transformation matrix is singular, e.g., if you introduce a projection matrix Mj into the mix (though if the projection is the first step you can still recover).
* You may notice that the plot is from python rather than matlab. I don't have matlab or octave sitting in front of me right now, so I hope I got the syntax details right.
You could compute the hilbert curve from f(x,y)=z. Basically it's a hamiltonian path traversal. You can find a good description at Nick's spatial index hilbert curve quadtree blog. Or take a look at monotonic n-ary gray code. I've written an implementation based on Nick's blog in php:http://monstercurves.codeplex.com.
I will focus only on your last point
(3) Transform each draw in [0,1]x[0,1] in a draw in [0,1] using the Hilbert space filling curve: under the Hilbert curve mapping, the draw in [0,1]x[0,1] should be the image of one (or more because of surjectivity) point(s) in [0,1]. I want pick one of these points. Is there any pre-built package in Matlab doing this?
As far as I know, there aren't pre-built packages in Matlab doing this, but the good news is that the code on wikipedia can be called from MATLAB, and it is as simple as putting together the conversion routine with a gateway function in a xy2d.c file:
#include "mex.h"
// source: https://en.wikipedia.org/wiki/Hilbert_curve
// rotate/flip a quadrant appropriately
void rot(int n, int *x, int *y, int rx, int ry) {
if (ry == 0) {
if (rx == 1) {
*x = n-1 - *x;
*y = n-1 - *y;
}
//Swap x and y
int t = *x;
*x = *y;
*y = t;
}
}
// convert (x,y) to d
int xy2d (int n, int x, int y) {
int rx, ry, s, d=0;
for (s=n/2; s>0; s/=2) {
rx = (x & s) > 0;
ry = (y & s) > 0;
d += s * s * ((3 * rx) ^ ry);
rot(s, &x, &y, rx, ry);
}
return d;
}
/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],
int nrhs, const mxArray *prhs[])
{
int n; /* input scalar */
int x; /* input scalar */
int y; /* input scalar */
int *d; /* output scalar */
/* check for proper number of arguments */
if(nrhs!=3) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nrhs","Three inputs required.");
}
if(nlhs!=1) {
mexErrMsgIdAndTxt("MyToolbox:arrayProduct:nlhs","One output required.");
}
/* get the value of the scalar inputs */
n = mxGetScalar(prhs[0]);
x = mxGetScalar(prhs[1]);
y = mxGetScalar(prhs[2]);
/* create the output */
plhs[0] = mxCreateDoubleScalar(xy2d(n,x,y));
/* get a pointer to the output scalar */
d = mxGetPr(plhs[0]);
}
and compile it with mex('xy2d.c').
The above implementation
[...] assumes a square divided into n by n cells, for n a power of 2, with integer coordinates, with (0,0) in the lower left corner, (n-1,n-1) in the upper right corner.
In practice, a discretization step is required before applying the mapping. As in every discretization problem, it is crucial to choose the precision wisely. The snippet below puts everything together.
close all; clear; clc;
% number of random samples
NSAMPL = 100;
% unit square divided into n-by-n cells
% has to be a power of 2
n = 2^2;
% quantum
d = 1/n;
N = 0:d:1;
% generate random samples
x = rand(1,NSAMPL);
y = rand(1,NSAMPL);
% discretization
bX = floor(x/d);
bY = floor(y/d);
% 2d to 1d mapping
dd = zeros(1,NSAMPL);
for iid = 1:length(dd)
dd(iid) = xy2d(n, bX(iid), bY(iid));
end
figure;
hold on;
axis equal;
plot(x, y, '.');
plot(repmat([0;1], 1, length(N)), repmat(N, 2, 1), '-r');
plot(repmat(N, 2, 1), repmat([0;1], 1, length(N)), '-r');
figure;
plot(1:NSAMPL, dd);
xlabel('# of sample')
So I'm currently working on a Java Processing program where I want to simulate high numbers of particles interacting with collision and gravity. This obviously causes some performance issue when particle count gets high, so I try my best to optimize and avoid expensive operations such as square-root, otherwise used in finding distance between two points.
However, now I'm wondering how I could do the algoritm that figures out the direction a particle should move, given it only knows the distance squared and the difference between particles' x and y (dx, dy).
Here's a snip of the code (yes, I know I should use vectors instead of seperate x/y-couples. Yes, I know I should eventually handle particles by grids and clusters for further optimization) Anyways:
void applyParticleGravity(){
int limit = 2*particleRadius+1; //Gravity no longer applied if particles are within collision reach of eachother.
float ax, ay, bx, by, dx, dy;
float distanceSquared, f;
float gpp = GPP; //Constant is used, since simulation currently assumes all particles have equal mass: GPP = Gravity constant * Particle Mass * Particle Mass
Vector direction = new Vector();
Particle a, b;
int nParticles = particles.size()-1; //"particles" is an arraylist with particles objects, each storing an x/y coordinate and velocity.
for (int i=0; i<nParticles; i++){
a = particles.get(i);
ax = a.x;
ay = a.y;
for (int j=i+1; j<nParticles; j++){
b = particles.get(j);
bx = b.x;
by = b.y;
dx = ax-bx;
dy = ay-by;
if (Math.abs(dx) > limit && Math.abs(dy) > limit){ //Not too close to eachother
distanceSquared = dx*dx + dy*dy; //Avoiding square roots
f = gpp/distanceSquared; //Gravity formula: Force = G*(m1*m2)/d^2
//Perform some trigonometric magic to decide direction.x and direction.y as a numbet between -1 and 1.
a.fx += f*direction.x; //Adds force to particle. At end of main iteration, x-position is increased by fx/mass and so forth.
a.fy += f*direction.y;
b.fx -= f*direction.x; //Apply inverse force to other particle (Newton's 3rd law)
b.fy -= f*direction.y;
}
}
}
}
Is there a more accurate way of deciding the x and y pull strength with some trigonometric magic without killing performance when particles are several hundreds? Something I thought about was doing some sort of (int)dx/dy with % operator or so and get an index of a pre-calculated array of values.
Anyone have a clue? Thanks!
hehe, I think we're working on the same kind of thing, except I'm using HTML5 canvas. I came across this trying to figure out the same thing. I didn't find anything but I figured out what I was going for, and I think it will work for you too.
You want an identity vector that points from one particle to the another. The length will be 1, and x and y will be between -1 and 1. Then you take this identity vector and multiply it by your force scalar, which you're already calculating
To "point at" one particle from another, without using square root, first get the heading (in radians) from particle1 to particle2:
heading = Math.atan2(dy, dx)
Note that y is first, I think this is how it works in Java. I used x first in Javascript and that worked for me.
Get the x and y components of this heading using sin/cos:
direction.x = Math.sin(heading)
direction.y = Math.cos(heading)
You can see an example here:
https://github.com/nijotz/triforces/blob/c7b85d06cf8a65713d9b84ae314d5a4a015876df/src/cljs/triforces/core.cljs#L41
It's Clojurescript, but it may help.
Long time listener, first time caller.
So I have been playing around with the Android NDK and I'm at a point where I want to Unproject a tap to world coordinates but I can't make it work.
The problem is the x and y values for both the near and far points are the same which doesn't seem right for a perspective projection. Everything in the scene draws OK so I'm a bit confused why it wouldn't unproject properly, anyway here is my code please help thanks
//x and y are the normalized screen coords
ndk_helper::Vec4 nearPoint = ndk_helper::Vec4(x, y, 1.f, 1.f);
ndk_helper::Vec4 farPoint = ndk_helper::Vec4(x, y, 1000.f, 1.f);
ndk_helper::Mat4 inverseProjView = this->matProjection * this->matView;
inverseProjView = inverseProjView.Inverse();
nearPoint = inverseProjView * nearPoint;
farPoint = inverseProjView * farPoint;
nearPoint = nearPoint *(1 / nearPoint.w_);
farPoint = farPoint *(1 / farPoint.w_);
Well, after looking at the vector/matrix math code in ndk_helper, this isn't a surprise. In short: Don't use it. After scanning through it for a couple of minutes, it has some obvious mistakes that look like simple typos. And particularly the Vec4 class is mostly useless for the kind of vector operations you need for graphics. Most of the operations assume that a Vec4 is a vector in 4D space, not a vector containing homogenous coordinates in 3D space.
If you want, you can check it out here, but be prepared for a few face palms:
https://android.googlesource.com/platform/development/+/master/ndk/sources/android/ndk_helper/vecmath.h
For example, this is the implementation of the multiplication used in the last two lines of your code:
Vec4 operator*( const float& rhs ) const
{
Vec4 ret;
ret.x_ = x_ * rhs;
ret.y_ = y_ * rhs;
ret.z_ = z_ * rhs;
ret.w_ = w_ * rhs;
return ret;
}
This multiplies a vector in 4D space by a scalar, but is completely wrong if you're operating with homogeneous coordinates. Which explains the results you are seeing.
I would suggest that you either write your own vector/matrix library that is suitable for graphics type operations, or use one of the freely available libraries that are tested, and used by others.
BTW, the specific values you are using for your test look somewhat odd. You definitely should not be getting the same results for the two vectors, but it's probably not what you had in mind anyway. For the z coordinate in your input vectors, you are using the distances of the near and far planes in eye coordinates. But then you apply the inverse view-projection matrix to those vectors, which transforms them back from clip/NDC space into world space. So your input vectors for this calculation should be in clip/NDC space, which means the z-coordinate values corresponding to the near/far plane should be at -1 and 1.
I saw that Fast Minimum Storage Ray/Triangle Intersection by Moller and Trumbore is frequently recommended.
The thing is, I don't mind pre-computing and storing any amounts of data, as long as it speeds-up the intersection.
So my question is, not caring about memory, what are the fastest methods of doing ray-triangle intersection?
Edit: I wont move the triangles, i.e. it is a static scene.
As others have mentioned, the most effective way to speed things up is to use an acceleration structure to reduce the number of ray-triangle intersections needed. That said, you still want your ray-triangle intersections to be fast. If you're happy to precompute stuff, you can try the following:
Convert your ray lines and your triangle edges to Plücker coordinates. This allows you to determine if your ray line passes through a triangle at 6 multiply/add's per edge. You will still need to compare your ray start and end points with the triangle plane (at 4 multiply/add's per point) to make sure it actually hits the triangle.
Worst case runtime expense is 26 multiply/add's total. Also, note that you only need to compute the ray/edge sign once per ray/edge combination, so if you're evaluating a mesh, you may be able to use each edge evaluation twice.
Also, these numbers assume everything is being done in homogeneous coordinates. You may be able to reduce the number of multiplications some by normalizing things ahead of time.
I have done a lot of benchmarks, and I can confidently say that the fastest (published) method is the one invented by Havel and Herout and presented in their paper Yet Faster Ray-Triangle Intersection (Using SSE4). Even without using SSE it is about twice as fast as Möller and Trumbore's algorithm.
My C implementation of Havel-Herout:
typedef struct {
vec3 n0; float d0;
vec3 n1; float d1;
vec3 n2; float d2;
} isect_hh_data;
void
isect_hh_pre(vec3 v0, vec3 v1, vec3 v2, isect_hh_data *D) {
vec3 e1 = v3_sub(v1, v0);
vec3 e2 = v3_sub(v2, v0);
D->n0 = v3_cross(e1, e2);
D->d0 = v3_dot(D->n0, v0);
float inv_denom = 1 / v3_dot(D->n0, D->n0);
D->n1 = v3_scale(v3_cross(e2, D->n0), inv_denom);
D->d1 = -v3_dot(D->n1, v0);
D->n2 = v3_scale(v3_cross(D->n0, e1), inv_denom);
D->d2 = -v3_dot(D->n2, v0);
}
inline bool
isect_hh(vec3 o, vec3 d, float *t, vec2 *uv, isect_hh_data *D) {
float det = v3_dot(D->n0, d);
float dett = D->d0 - v3_dot(o, D->n0);
vec3 wr = v3_add(v3_scale(o, det), v3_scale(d, dett));
uv->x = v3_dot(wr, D->n1) + det * D->d1;
uv->y = v3_dot(wr, D->n2) + det * D->d2;
float tmpdet0 = det - uv->x - uv->y;
int pdet0 = ((int_or_float)tmpdet0).i;
int pdetu = ((int_or_float)uv->x).i;
int pdetv = ((int_or_float)uv->y).i;
pdet0 = pdet0 ^ pdetu;
pdet0 = pdet0 | (pdetu ^ pdetv);
if (pdet0 & 0x80000000)
return false;
float rdet = 1 / det;
uv->x *= rdet;
uv->y *= rdet;
*t = dett * rdet;
return *t >= ISECT_NEAR && *t <= ISECT_FAR;
}
One suggestion could be to implement the octree (http://en.wikipedia.org/wiki/Octree) algorithm to partition your 3D Space into very fine blocks. The finer the partitioning the more memory required, but the better accuracy the tree gets.
You still need to check ray/triangle intersections, but the idea is that the tree can tell you when you can skip the ray/triangle intersection, because the ray is guaranteed not to hit the triangle.
However, if you start moving your triangle around, you need to update the Octree, and then I'm not sure it's going to save you anything.
Found this article by Dan Sunday:
Based on a count of the operations done up to the first rejection test, this algorithm is a bit less efficient than the MT (Möller & Trumbore) algorithm, [...]. However, the MT algorithm uses two cross products whereas our algorithm uses only one, and the one we use computes the normal vector of the triangle's plane, which is needed to compute the line parameter rI. But, when the normal vectors have been precomputed and stored for all triangles in a scene (which is often the case), our algorithm would not have to compute this cross product at all. But, in this case, the MT algorithm would still compute two cross products, and be less efficient than our algorithm.
http://geomalgorithms.com/a06-_intersect-2.html