Numerically stable way to compute (x-sin(x))/x^3 - precision

I'm looking for a very fast way to compute (x-sin(x))/(x^3) for all x using IEEE floating point arithmetic and standard trigonometric functions. At 0, it should return 1/6.
For sin(x)/x, it's sufficient to check if x=0 and return 1, otherwise just compute it using standard floating point sin and division. For (1-cos(x))/x^2, if cos(x) <= 0, this expression is fine as is and otherwise express as (sin(x)/x)^2/(1+cos(x))
But I can't figure out how to express (x-sin(x))/x^3.
So far, the best I have is to compute the infinite sum until it converges: $\sum_0^{\infty}{1/4^(n+1)sin(x/2^n)/(x/2^n)(1-cos(x/2^n))/(x/2^n)^2}$
but I'd prefer a closed form

(1 - cos x) / x2 is fundamentally different from (x - sin x) / x3, in that unity can be constructed by trigonometric functions as sin2 x + cos2 x = 1, while the same is not true of x. This means we cannot transform the latter function into a numerically advantageous closed-form trigonometric formula. I thought long about this and also tried manipulating the formula with all trigonometric identities I am aware of. I would love to be proven wrong; that seems like a question for Math Stack Exchange. The easiest and most accurate way to implement the former function is
// (1-cos(x))/x**2
double cosm1_over_xsquared (double x)
{
if (fabs (x) < sqrt (DBL_EPSILON)) {
return 0.5;
} else {
double s = sin (x * 0.5) / x;
return 2.0 * s * s;
}
}
If the standard math library computes sin() with an error just slightly over half an ulp, this implementation computes (1 - cos x) / x2 with an error no larger than 4 ulp. As a side-note, this function also lends itself to the use of Kahan's self-correction technique, which he first demonstrated for the computation of (ex - 1) / x in
William M. Kahan, "Interval arithmetic in the proposed IEEE floating point arithmetic standard." In Karl L. E. Nickel (ed.), Interval Arithmetic 1980, Academic Press 1980, pp. 99-128.
// (1-cos(x))/x**2 on [-3, 3] using Kahan's self-compensation technique
double cosm1_over_xsquared_kahan (double x)
{
double u = cos (x);
double n = 1.0 - u;
if (n == 0.0) {
return 0.5;
}
double d = acos (u);
return n / (d * d);
}
If both cos() and acos() have a maximum error just slightly over half an ulp, this function returns results with an error of less than 5 ulps. Because cos, other than ex is a periodic function, this approach works only on the restricted interval noted in the code comment.
The above suggests that we should shoot for an implementation of (x - sin x) / x3 with a maximum error of about 4 ulp. Characterizing the naive computation, we find that it is adequate for |x| > 1 under this provision. Despite the narrow input domain for an alternate computation, Kahan's self-compensation technique does not work for this function. The old standby of math function implementers, a polynomial minimax approximation, works just fine, however. This results in the following code:
// (x-sin(x))/x**3
double sinmx_over_xcubed (double x)
{
if (fabs(x) < 1.0) { // minimax approximation
double x2 = x * x;
double p = 7.5475867852548673E-13;
p = p * x2 - 1.6057658525730946E-10;
p = p * x2 + 2.5052098906959416E-8;
p = p * x2 - 2.7557319191306421E-6;
p = p * x2 + 1.9841269841218293E-4;
p = p * x2 - 8.3333333333333055E-3;
p = p * x2 + 1.6666666666666666E-1;
return p;
} else {
return (x - sin (x)) / x / x / x;
}
}

Well the Taylor series for this expression is:
1/6 - x^2/120 + x^4/5040 + O(x^6) (converges when x!=0)
Which should be pretty good for most applications.
Addendum
If you are trying to find the limit at 0 for this: then, apply the L'Hôpital's rule since this is of form 0/0 for lim x->0
lim(x->0) (x-sin(x))/x^3 = lim(x->0) (1 - cos(x))/3x^2 = lim(x->0) = sin(x)/6x = lim(x->0) = 1/6 ;
In other words, it's probably best to use an if statement and for the case of (x = 0) and then use Taylor series which will be a LOT faster than doing floating point sin and cos unless you have purpose-built hardware or are using GPUs.

Related

Writing a vector sum in MATLAB

Suppose I have a function phi(x1,x2)=k1*x1+k2*x2 which I have evaluated over a grid where the grid is a square having boundaries at -100 and 100 in both x1 and x2 axis with some step size say h=0.1. Now I want to calculate this sum over the grid with which I'm struggling:
What I was trying :
clear all
close all
clc
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = D1 : h : D2;
Y = D1 : h : D2;
[x1, x2] = meshgrid(X, Y);
k1=2;k2=2;
phi = k1.*x1 + k2.*x2;
figure(1)
surf(X,Y,phi)
m1=-500:500;
m2=-500:500;
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
sys=#(m1,m2,X,Y) (k1*h*m1+k2*h*m2).*exp((-([X Y]-h*[m1 m2]).^2)./(h^2*D))
sum1=sum(sys(M1,M2,X1,X2))
Matlab says error in ndgrid, any idea how I should code this?
MATLAB shows:
Error using repmat
Requested 10001x1001x2001x2001 (298649.5GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
Error in ndgrid (line 72)
varargout{i} = repmat(x,s);
Error in new_try1 (line 16)
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
Judging by your comments and your code, it appears as though you don't fully understand what the equation is asking you to compute.
To obtain the value M(x1,x2) at some given (x1,x2), you have to compute that sum over Z2. Of course, using a numerical toolbox such as MATLAB, you could only ever hope to compute over some finite range of Z2. In this case, since (x1,x2) covers the range [-100,100] x [-100,100], and h=0.1, it follows that mh covers the range [-1000, 1000] x [-1000, 1000]. Example: m = (-1000, -1000) gives you mh = (-100, -100), which is the bottom-left corner of your domain. So really, phi(mh) is just phi(x1,x2) evaluated on all of your discretised points.
As an aside, since you need to compute |x-hm|^2, you can treat x = x1 + i x2 as a complex number to make use of MATLAB's abs function. If you were strictly working with vectors, you would have to use norm, which is OK too, but a bit more verbose. Thus, for some given x=(x10, x20), you would compute x-hm over the entire discretised plane as (x10 - x1) + i (x20 - x2).
Finally, you can compute 1 term of M at a time:
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = (D1 : h : D2); % X is in rows (dim 2)
Y = (D1 : h : D2)'; % Y is in columns (dim 1)
k1=2;k2=2;
phi = k1*X + k2*Y;
M = zeros(length(Y), length(X));
for j = 1:length(X)
for i = 1:length(Y)
% treat (x - hm) as a complex number
x_hm = (X(j)-X) + 1i*(Y(i)-Y); % this computes x-hm for all m
M(i,j) = 1/(pi*D) * sum(sum(phi .* exp(-abs(x_hm).^2/(h^2*D)), 1), 2);
end
end
By the way, this computation takes quite a long time. You can consider either increasing h, reducing D1 and D2, or changing all three of them.

Evaluating multivariable recurrance relation as quickly as possible

I have a handful of functions, which are given some variables x, y, z, etc., and evaluate their respective multinomial with some constant coefficients at these points, returning the values. My variables are then set equal to these values, and the process is repeated. In psuedo-code:
repeat N times:
x_new = f1(x, y, z)
y_new = f2(x, y, z)
z_new = f3(x, y, z)
x = x_new
y = y_new
z = z_new
and functions are something like
f1(x, y, z)
return c0 + c1*x + c2*y + c3*z + c4*x*x + c5*x*y + c6*x*z ...
Every time I save the intermediate values of my variables, and later graph the result. For the sake of having enough points to make the graph of sufficient quality, I need around 10 million data points. I have code that does this fairly fast for 3 variables and multinomials of degree 3, but I'm seeking to expand this to support an arbitrary number of variables and multinomials of any degree. This is where things get slow, and I think I need a different approach.
My original (hard coded) c evaluator looked like:
double evaluateStep(double x, double y, double z, double *coeffs) {
double sum;
sum = coeffs[0];
sum += coeffs[1] * x;
sum += coeffs[2] * y;
sum += coeffs[3] * z;
sum += coeffs[4] * x * y;
sum += coeffs[5] * x * z;
sum += coeffs[6] * y * z;
sum += coeffs[7] * x * x;
sum += coeffs[8] * y * y;
sum += coeffs[9] * z * z;
sum += coeffs[10] * x * y * z;
sum += coeffs[11] * x * x * y;
sum += coeffs[12] * x * x * z;
sum += coeffs[13] * y * y * x;
sum += coeffs[14] * y * y * z;
sum += coeffs[15] * z * z * x;
sum += coeffs[16] * z * z * y;
sum += coeffs[17] * x * x * x;
sum += coeffs[18] * y * y * y;
sum += coeffs[19] * z * z * z;
return sum;
}
The generalized code:
double recursiveEval(double factor, double *position, int ndim, double **coeffs, int order) {
if (!order)
return factor * *((*coeffs)++);
double sum = 0;
for (int i = 0; i < ndim; i++)
sum += recursiveEval(factor * position[i], position + i, ndim - i, coeffs, order - 1);
return sum;
}
double evaluateNDStep(double *position, double *coeffs) {
double *coefficients = coeffs;
return recursiveEval(1, position, NUM_DIMENSIONS + 1, &coefficients, ORDER);
}
Of course, I'm sure this has just destroyed my compiler (gcc)'s ability to do common subexpression elimination, and other similar optimizations. I'm wondering if there's a better approach I could take. There is one bit of information that I haven't been able to take advantage of yet- all the coefficients are drawn at the start of the program from a pool of 25 (with repetition). These span [-1.2, 1.2] inclusive if that makes any difference.
I've considered computing a single term of the multinomial, and then transforming it into each of the other terms by simple variableA/variableB multiplications, but the amount of error that introduces may be a bit too much. I've also considered computing all terms up to degree n, and using those to construct the n+1 through 2n degree terms. One other option that I'm not sure how to implement is would be to factor these terms taking advantage of the fact that many terms will share the same coefficient (pidginhole principle), so it may be possible to bring it down to fewer required multiplications & additions than there are terms in the expanded expression. I have no clue how to go about doing this though, since it would have to happen at run time, and that'd cross into the area of JIT compilation of the expression or something.
As it stands, 3 variables at degree 3 (20 terms total) takes almost 6 seconds, and 4 variables at degree 3 (35 terms total) takes a bit over 13 seconds. These will not scale well into larger degrees or more variables (the formula for the number of distinct terms is nck(degree + variables, degree) where nck is "n choose k"). I'm looking for an way to improve the performance of the generalized code, ideally asymptotically (I suspect via partial factorization). I don't really care about the language; I'm writing this code in C, but if you prefer to present an algorithm in some other language that will not be a problem for me.

How to fix this floating point square root algorithm

I am trying to compute the IEEE-754 32-bit Floating Point Square Root of various inputs but for one particular input the below algorithm based upon the Newton-Raphson method won't converge, I am wondering what I can do to fix the problem? For the platform I am designing I have a 32-bit floating point adder/subtracter, multiplier, and divider.
For input 0x7F7FFFFF (3.4028234663852886E38)., the algorithm won't converge to the correct answer of 18446743523953729536.000000 This algorithm's answer gives 18446743523953737728.000000.
I am using MATLAB to implement my code before I implement this in hardware. I can only use single precision floating point values, (SO NO DOUBLES).
clc; clear; close all;
% Input
R = typecast(uint32(hex2dec(num2str(dec2hex(((hex2dec('7F7FFFFF'))))))),'single')
% Initial estimate
OneOverRoot2 = single(1/sqrt(2));
Root2 = single(sqrt(2));
% Get low and high bits of input R
hexdata_high = bitand(bitshift(hex2dec(num2hex(single(R))),-16),hex2dec('ffff'));
hexdata_low = bitand(hex2dec(num2hex(single(R))),hex2dec('ffff'));
% Change exponent of input to -1 to get Mantissa
temp = bitand(hexdata_high,hex2dec('807F'));
Expo = bitshift(bitand(hexdata_high,hex2dec('7F80')),-7);
hexdata_high = bitor(temp,hex2dec('3F00'));
b = typecast(uint32(hex2dec(num2str(dec2hex(((bitshift(hexdata_high,16)+ hexdata_low)))))),'single');
% If exponent is odd ...
if (bitand(Expo,1))
% Pretend the mantissa [0.5 ... 1.0) is multiplied by 2 as Expo is odd,
% so it now has the value [1.0 ... 2.0)
% Estimate the sqrt(mantissa) as [1.0 ... sqrt(2))
% IOW: linearly map (0.5 ... 1.0) to (1.0 ... sqrt(2))
Mantissa = (Root2 - 1.0)/(1.0 - 0.5)*(b - 0.5) + 1.0;
else
% The mantissa is in range [0.5 ... 1.0)
% Estimate the sqrt(mantissa) as [1/sqrt(2) ... 1.0)
% IOW: linearly map (0.5 ... 1.0) to (1/sqrt(2) ... 1.0)
Mantissa = (1.0 - OneOverRoot2)/(1.0 - 0.5)*(b - 0.5) + OneOverRoot2;
end
newS = Mantissa*2^(bitshift(Expo-127,-1));
S=newS
% S = (S + R/S)/2 method
for j = 1:6
fprintf('S %u %f %f\n', j, S, (S-sqrt(R)));
S = single((single(S) + single(single(R)/single(S))))/2;
S = single(S);
end
goodaccuracy = (abs((single(S)-single(sqrt(single(R)))))) < 2^-23
difference = (abs((single(S)-single(sqrt(single(R))))))
% Get hexadecimal output
hexdata_high = (bitand(bitshift(hex2dec(num2hex(single(S))),-16),hex2dec('ffff')));
hexdata_low = (bitand(hex2dec(num2hex(single(S))),hex2dec('ffff')));
fprintf('FLOAT: T Input: %e\t\tCorrect: %e\t\tMy answer: %e\n', R, sqrt(R), S);
fprintf('output hex = 0x%04X%04X\n',hexdata_high,hexdata_low);
out = hex2dec(num2hex(single(S)));
I took a whack at this. Here's what I came up with:
float mysqrtf(float f) {
if (f < 0) return 0.0f/0.0f;
if (f == 1.0f / 0.0f) return f;
if (f != f) return f;
// half-ass an initial guess of 1.0.
int expo;
float foo = frexpf(f, &expo);
float s = 1.0;
if (expo & 1) foo *= 2, expo--;
// this is the only case for which what's below fails.
if (foo == 0x0.ffffffp+0) return ldexpf(0x0.ffffffp+0, expo/2);
// do four newton iterations.
for (int i = 0; i < 4; i++) {
float diff = s*s-foo;
diff /= s;
s -= diff/2;
}
// do one last newton iteration, computing s*s-foo exactly.
float scal = s >= 1 ? 4096 : 2048;
float shi = (s + scal) - scal; // high 12 bits of significand
float slo = s - shi; // rest of significand
float diff = shi * shi - foo; // subtraction exact by sterbenz's theorem
diff += 2 * shi * slo; // opposite signs; exact by sterbenz's theorem
diff += slo * slo;
diff /= s; // diff == fma(s, s, -foo) / s.
s -= diff/2;
return ldexpf(s, expo/2);
}
The first thing to analyse is the formula (s*s-foo)/s in floating-point arithmetic. If s is a sufficiently good approximation to sqrt(foo), Sterbenz's theorem tells us that the numerator is within an ulp(foo) of the right answer --- all of that error is approximation error from computing s*s. Then we divide by s; this gives us at worst another half-ulp of approximation error. So, even without a fused multiply-add, diff is within 1.5 ulp of what it should be. And we divide it by two.
Notice that the initial guess doesn't in and of itself matter as long as you follow it up with enough Newton iterations.
Measure the error of an approximation s to sqrt(foo) by abs(s - foo/s). The error of my initial guess of 1 is at most 1. A Newton iteration in exact arithmetic squares the error and divides it by 4. A Newton iteration in floating-point arithmetic --- the kind I do four times --- squares the error, divides it by 4, and kicks in another 0.75 ulp of error. You do this four times and you find you have a relative error at most 0x0.000000C4018384, which is about 0.77 ulp. This means that four Newton iterations yield a faithfully-rounded result.
I do a fifth Newton step to get a correctly-rounded square root. The reason why it works is a little more intricate.
shi holds the "top half" of s while slo holds the "bottom half." The last 12 bits in each significand will be zero. This means, in particular, that shi * shi and shi * slo and slo * slo are exactly representable as floats.
s*s is within two ulps of foo. shi*shi is within 2047 ulps of s*s. Thus shi * shi - foo is within 2049 ulps of zero; in particular, it's exactly representable and less than 2-10.
You can check that you can add 2 * shi * slo and get an exactly-representable result that's within 2-22 of zero and then add slo*slo and get an exactly representable result --- s*s-foo computed exactly.
When you divide by s, you kick in an additional half-ulp of error, which is at most 2-48 here since our error was already so small.
Now we do a Newton step. We've computed the current error correctly to within 2-46. Adding half of it to s gives us the square root to within 3*2-48.
To turn this into a guarantee of correct rounding, we need to prove that there are no floats between 1/2 and 2, other than the one I special-cased, whose square roots are within 3*2-48 of a midpoint between two consecutive floats. You can do some error analysis, get a Diophantine equation, find all of the solutions of that Diophantine equation, find which inputs they correspond to, and work out what the algorithm does on those. (If you do this, there is one "physical" solution and a bunch of "unphysical" solutions. The one real solution is the only thing I special-cased.) There may be a cleaner way, however.

Fastest way to sort vectors by angle without actually computing that angle

Many algorithms (e.g. Graham scan) require points or vectors to be sorted by their angle (perhaps as seen from some other point, i.e. using difference vectors). This order is inherently cyclic, and where this cycle is broken to compute linear values often doesn't matter that much. But the real angle value doesn't matter much either, as long as cyclic order is maintained. So doing an atan2 call for every point might be wasteful. What faster methods are there to compute a value which is strictly monotonic in the angle, the way atan2 is? Such functions apparently have been called “pseudoangle” by some.
I started to play around with this and realised that the spec is kind of incomplete. atan2 has a discontinuity, because as dx and dy are varied, there's a point where atan2 will jump between -pi and +pi. The graph below shows the two formulas suggested by #MvG, and in fact they both have the discontinuity in a different place compared to atan2. (NB: I added 3 to the first formula and 4 to the alternative so that the lines don't overlap on the graph). If I added atan2 to that graph then it would be the straight line y=x. So it seems to me that there could be various answers, depending on where one wants to put the discontinuity. If one really wants to replicate atan2, the answer (in this genre) would be
# Input: dx, dy: coordinates of a (difference) vector.
# Output: a number from the range [-2 .. 2] which is monotonic
# in the angle this vector makes against the x axis.
# and with the same discontinuity as atan2
def pseudoangle(dx, dy):
p = dx/(abs(dx)+abs(dy)) # -1 .. 1 increasing with x
if dy < 0: return p - 1 # -2 .. 0 increasing with x
else: return 1 - p # 0 .. 2 decreasing with x
This means that if the language that you're using has a sign function, you could avoid branching by returning sign(dy)(1-p), which has the effect of putting an answer of 0 at the discontinuity between returning -2 and +2. And the same trick would work with #MvG's original methodology, one could return sign(dx)(p-1).
Update In a comment below, #MvG suggests a one-line C implementation of this, namely
pseudoangle = copysign(1. - dx/(fabs(dx)+fabs(dy)),dy)
#MvG says it works well, and it looks good to me :-).
I know one possible such function, which I will describe here.
# Input: dx, dy: coordinates of a (difference) vector.
# Output: a number from the range [-1 .. 3] (or [0 .. 4] with the comment enabled)
# which is monotonic in the angle this vector makes against the x axis.
def pseudoangle(dx, dy):
ax = abs(dx)
ay = abs(dy)
p = dy/(ax+ay)
if dx < 0: p = 2 - p
# elif dy < 0: p = 4 + p
return p
So why does this work? One thing to note is that scaling all input lengths will not affect the ouput. So the length of the vector (dx, dy) is irrelevant, only its direction matters. Concentrating on the first quadrant, we may for the moment assume dx == 1. Then dy/(1+dy) grows monotonically from zero for dy == 0 to one for infinite dy (i.e. for dx == 0). Now the other quadrants have to be handled as well. If dy is negative, then so is the initial p. So for positive dx we already have a range -1 <= p <= 1 monotonic in the angle. For dx < 0 we change the sign and add two. That gives a range 1 <= p <= 3 for dx < 0, and a range of -1 <= p <= 3 on the whole. If negative numbers are for some reason undesirable, the elif comment line can be included, which will shift the 4th quadrant from -1…0 to 3…4.
I don't know if the above function has an established name, and who might have published it first. I've gotten it quite a while ago and copied it from one project to the next. I have however found occurrences of this on the web, so I'd consider this snipped public enough for re-use.
There is a way to obtain the range [0 … 4] (for real angles [0 … 2π]) without introducing a further case distinction:
# Input: dx, dy: coordinates of a (difference) vector.
# Output: a number from the range [0 .. 4] which is monotonic
# in the angle this vector makes against the x axis.
def pseudoangle(dx, dy):
p = dx/(abs(dx)+abs(dy)) # -1 .. 1 increasing with x
if dy < 0: return 3 + p # 2 .. 4 increasing with x
else: return 1 - p # 0 .. 2 decreasing with x
I kinda like trigonometry, so I know the best way of mapping an angle to some values we usually have is a tangent. Of course, if we want a finite number in order to not have the hassle of comparing {sign(x),y/x}, it gets a bit more confusing.
But there is a function that maps [1,+inf[ to [1,0[ known as inverse, that will allow us to have a finite range to which we will map angles. The inverse of the tangent is the well known cotangent, thus x/y (yes, it's as simple as that).
A little illustration, showing the values of tangent and cotangent on a unit circle :
You see the values are the same when |x| = |y|, and you see also that if we color the parts that output a value between [-1,1] on both circles, we manage to color a full circle. To have this mapping of values be continuous and monotonous, we can do two this :
use the opposite of the cotangent to have the same monotony as tangent
add 2 to -cotan, to have the values coincide where tan=1
add 4 to one half of the circle (say, below the x=-y diagonal) to have values fit on the one of the discontinuities.
That gives the following piecewise function, which is a continuous and monotonous function of the angles, with only one discontinuity (which is the minimum) :
double pseudoangle(double dx, double dy)
{
// 1 for above, 0 for below the diagonal/anti-diagonal
int diag = dx > dy;
int adiag = dx > -dy;
double r = !adiag ? 4 : 0;
if (dy == 0)
return r;
if (diag ^ adiag)
r += 2 - dx / dy;
else
r += dy / dx;
return r;
}
Note that this is very close to Fowler angles, with the same properties. Formally, pseudoangle(dx,dy) + 1 % 8 == Fowler(dx,dy)
To talk performance, it's much less branchy than Fowler's code (and generally less complicated imo). Compiled with -O3 on gcc 6.1.1, the above function generates an assembly code with 4 branches, where two of them come from dy == 0 (one checking if the both operands are "unordered", thus if dy was NaN, and the other checking if they are equal).
I would argue this version is more precise than others, since it only uses mantissa preserving operations, until shifting the result to the right interval. This should be especially visible when |x| << |y| or |y| >> |x|, then the operation |x| + |y| looses quite some precision.
As you can see on the graph the angle-pseudoangle relation is also nicely close to linear.
Looking where branches come from, we can make the following remarks:
My code doesn't rely on abs nor copysign, which makes it look more self-contained. However playing with sign bits on floating point values is actually rather trivial, since it's just flipping a separate bit (no branch!), so this is more of a disadvantage.
Furthermore other solutions proposed here do not check whether abs(dx) + abs(dy) == 0 before dividing by it, but this version would fail as soon as only one component (dy) is 0 -- so that throws in a branch (or 2 in my case).
If we choose to get roughly the same result (up to rounding errors) but without branches, we could abuse copsign and write:
double pseudoangle(double dx, double dy)
{
double s = dx + dy;
double d = dx - dy;
double r = 2 * (1.0 - copysign(1.0, s));
double xor_sign = copysign(1.0, d) * copysign(1.0, s);
r += (1.0 - xor_sign);
r += (s - xor_sign * d) / (d + xor_sign * s);
return r;
}
Bigger errors may happen than with the previous implementation, due to cancellation in either d or s if dx and dy are close in absolute value. There is no check for division by zero to be comparable with the other implementations presented, and because this only happens when both dx and dy are 0.
If you can feed the original vectors instead of angles into a comparison function when sorting, you can make it work with:
Just a single branch.
Only floating point comparisons and multiplications.
Avoiding addition and subtraction makes it numerically much more robust. A double can actually always exactly represent the product of two floats, but not necessarily their sum. This means for single precision input you can guarantee a perfect flawless result with little effort.
This is basically Cimbali's solution repeated for both vectors, with branches eliminated and divisions multiplied away. It returns an integer, with sign matching the comparison result (positive, negative or zero):
signed int compare(double x1, double y1, double x2, double y2) {
unsigned int d1 = x1 > y1;
unsigned int d2 = x2 > y2;
unsigned int a1 = x1 > -y1;
unsigned int a2 = x2 > -y2;
// Quotients of both angles.
unsigned int qa = d1 * 2 + a1;
unsigned int qb = d2 * 2 + a2;
if(qa != qb) return((0x6c >> qa * 2 & 6) - (0x6c >> qb * 2 & 6));
d1 ^= a1;
double p = x1 * y2;
double q = x2 * y1;
// Numerator of each remainder, multiplied by denominator of the other.
double na = q * (1 - d1) - p * d1;
double nb = p * (1 - d1) - q * d1;
// Return signum(na - nb)
return((na > nb) - (na < nb));
}
The simpliest thing I came up with is making normalized copies of the points and splitting the circle around them in half along the x or y axis. Then use the opposite axis as a linear value between the beginning and end of the top or bottom buffer (one buffer will need to be in reverse linear order when putting it in.) Then you can read the first then second buffer linearly and it will be clockwise, or second and first in reverse for counter clockwise.
That might not be a good explanation so I put some code up on GitHub that uses this method to sort points with an epsilion value to size the arrays.
https://github.com/Phobos001/SpatialSort2D
This might not be good for your use case because it's built for performance in graphics effects rendering, but it's fast and simple (O(N) Complexity). If your working with really small changes in points or very large (hundreds of thousands) data sets then this won't work because the memory usage might outweigh the performance benefits.
nice.. here is a varient that returns -Pi , Pi like many arctan2 functions.
edit note: changed my pseudoscode to proper python.. arg order changed for compatibility with pythons math module atan2(). Edit2 bother more code to catch the case dx=0.
def pseudoangle( dy , dx ):
""" returns approximation to math.atan2(dy,dx)*2/pi"""
if dx == 0 :
s = cmp(dy,0)
else::
s = cmp(dx*dy,0) # cmp == "sign" in many other languages.
if s == 0 : return 0 # doesnt hurt performance much.but can omit if 0,0 never happens
p = dy/(dx+s*dy)
if dx < 0: return p-2*s
return p
In this form the max error is only ~0.07 radian for all angles.
(of course leave out the Pi/2 if you don't care about the magnitude.)
Now for the bad news -- on my system using python math.atan2 is about 25% faster
Obviously replacing a simple interpreted code doesnt beat a compiled intrisic.
If angles are not needed by themselves, but only for sorting, then #jjrv approach is the best one. Here is a comparison in Julia
using StableRNGs
using BenchmarkTools
# Definitions
struct V{T}
x::T
y::T
end
function pseudoangle(v)
copysign(1. - v.x/(abs(v.x)+abs(v.y)), v.y)
end
function isangleless(v1, v2)
a1 = abs(v1.x) + abs(v1.y)
a2 = abs(v2.x) + abs(v2.y)
a2*copysign(a1 - v1.x, v1.y) < a1*copysign(a2 - v2.x, v2.y)
end
# Data
rng = StableRNG(2021)
vectors = map(x -> V(x...), zip(rand(rng, 1000), rand(rng, 1000)))
# Comparison
res1 = sort(vectors, by = x -> pseudoangle(x));
res2 = sort(vectors, lt = (x, y) -> isangleless(x, y));
#assert res1 == res2
#btime sort($vectors, by = x -> pseudoangle(x));
# 110.437 μs (3 allocations: 23.70 KiB)
#btime sort($vectors, lt = (x, y) -> isangleless(x, y));
# 65.703 μs (3 allocations: 23.70 KiB)
So, by avoiding division, time is almost halved without losing result quality. Of course, for more precise calculations, isangleless should be equipped with bigfloat from time to time, but the same can be told about pseudoangle.
Just use a cross-product function. The direction you rotate one segment relative to the other will give either a positive or negative number. No trig functions and no division. Fast and simple. Just Google it.

shoot projectile (straight trajectory) at moving target in 3 dimensions

I already googled for the problem but only found either 2D solutions or formulas that didn't work for me (found this formula that looks nice: http://www.ogre3d.org/forums/viewtopic.php?f=10&t=55796 but seems not to be correct).
I have given:
Vec3 cannonPos;
Vec3 targetPos;
Vec3 targetVelocityVec;
float bulletSpeed;
what i'm looking for is time t such that
targetPos+t*targetVelocityVec
is the intersectionpoint where to aim the cannon to and shoot.
I'm looking for a simple, inexpensive formula for t (by simple i just mean not making many unnecessary vectorspace transformations and the like)
thanks!
The real problem is finding out where in space that the bullet can intersect the targets path. The bullet speed is constant, so in a certain amount of time it will travel the same distance regardless of the direction in which we fire it. This means that it's position after time t will always lie on a sphere. Here's an ugly illustration in 2d:
This sphere can be expressed mathematically as:
(x-x_b0)^2 + (y-y_b0)^2 + (z-z_b0)^2 = (bulletSpeed * t)^2 (eq 1)
x_b0, y_b0 and z_b0 denote the position of the cannon. You can find the time t by solving this equation for t using the equation provided in your question:
targetPos+t*targetVelocityVec (eq 2)
(eq 2) is a vector equation and can be decomposed into three separate equations:
x = x_t0 + t * v_x
y = y_t0 + t * v_y
z = z_t0 + t * v_z
These three equations can be inserted into (eq 1):
(x_t0 + t * v_x - x_b0)^2 + (y_t0 + t * v_y - y_b0)^2 + (z_t0 + t * v_z - z_b0)^2 = (bulletSpeed * t)^2
This equation contains only known variables and can be solved for t. By assigning the constant part of the quadratic subexpressions to constants we can simplify the calculation:
c_1 = x_t0 - x_b0
c_2 = y_t0 - y_b0
c_3 = z_t0 - z_b0
(v_b = bulletSpeed)
(t * v_x + c_1)^2 + (t * v_y + c_2)^2 + (t * v_z + c_3)^2 = (v_b * t)^2
Rearrange it as a standard quadratic equation:
(v_x^2+v_y^2+v_z^2-v_b^2)t^2 + 2*(v_x*c_1+v_y*c_2+v_z*c_3)t + (c_1^2+c_2^2+c_3^2) = 0
This is easily solvable using the standard formula. It can result in zero, one or two solutions. Zero solutions (not counting complex solutions) means that there's no possible way for the bullet to reach the target. One solution will probably happen very rarely, when the target trajectory intersects with the very edge of the sphere. Two solutions will be the most common scenario. A negative solution means that you can't hit the target, since you would need to fire the bullet into the past. These are all conditions you'll have to check for.
When you've solved the equation you can find the position of t by putting it back into (eq 2). In pseudo code:
# setup all needed variables
c_1 = x_t0 - x_b0
c_2 = y_t0 - y_b0
c_3 = z_t0 - z_b0
v_b = bulletSpeed
# ... and so on
a = v_x^2+v_y^2+v_z^2-v_b^2
b = 2*(v_x*c_1+v_y*c_2+v_z*c_3)
c = c_1^2+c_2^2+c_3^2
if b^2 < 4*a*c:
# no real solutions
raise error
p = -b/(2*a)
q = sqrt(b^2 - 4*a*c)/(2*a)
t1 = p-q
t2 = p+q
if t1 < 0 and t2 < 0:
# no positive solutions, all possible trajectories are in the past
raise error
# we want to hit it at the earliest possible time
if t1 > t2: t = t2
else: t = t1
# calculate point of collision
x = x_t0 + t * v_x
y = y_t0 + t * v_y
z = z_t0 + t * v_z

Resources