fast small angle sinus/cosinus approximation

fast small angle sinus/cosinus approximation - performance

I'm doing some rigid-body rotation dynamics simulation, which means I have to compute many rotations by small angle, which has performance bottleneck in evaluation of trigonometric function. Now I do it by Taylor(McLaurin) series:
class double2{
double x,y;
// Intristic full sin/cos
final void rotate ( double a){
double x_=x;
double ca=Math.cos(a); double sa=Math.sin(a);
x=ca*x_-sa*y; y=sa*x_+ca*y;
}
// Taylor 7th-order aproximation
final void rotate_d7( double a){
double x_=x;
double a2=a*a;
double a4=a2*a2;
double a6=a4*a2;
double ca= 1.0d - a2 /2.0d + a4 /24.0d - a6/720.0d;
double sa= a - a2*a/6.0d + a4*a/120.0d - a6*a/5040.0d;
x=ca*x_-sa*y; y=sa*x_+ca*y;
}
}
but the trade of performance-speed is not so great as I would expect:
error(100x dphi=Pi/100 ) time [ns pre rotation]
v.rotate_d1() : -0.010044860504615213 9.314306 ns/op
v.rotate_d3() : 3.2624666136960023E-6 16.268745 ns/op
v.rotate_d5() : -4.600003294941146E-10 35.433617 ns/op
v.rotate_d7() : 3.416711358283919E-14 49.831547 ns/op
v.rotate() : 3.469446951953614E-16 75.70213 ns/op
Is there any faster method how to evaluate approximation of sin() and cos() for small angle ( like < Pi/100 )
I was thinking maybe some rational series, or continuous fraction approximation? Do you know any? ( Precomputed table doesn't make sense here )

You might find that adjusting your calculations can improve performance. E.g.:
const double c7 = -1/5040d;
const double c5 = 1/120d;
const double c3 = -1/6d;
double a2 = a * a;
double sa = (((c7 * a2 + c5) * a2 + c3) * a2 + 1) * a;
// similarly for cos
Now the optimiser might be doing some of this itself anyway, so your mileage may vary. Would be interested to know the results either way.

Instead of optimizing the trig functions, see if you can do without them. Rigid-body simulations tend to be a perfectly natural fit for vector math.

Two ways : reduce the precision if possible (as often in video games, use minimal acceptable precision if you aim performance)
the you should try to use tabulated values. Once per execution (when the game loads ?) compute an array of sinus/ cosinus/ that you then access in constant time.
float cosAlpha = COSINUS[(int)(k*alpha)]; // e.g: k = 1000
tune k and the array size to choose angle resolution vs. memory footprint.
edit: Don't forget to use parity of cosinus/sinus functions to avoid duplicate values in the tab
edit2: try floats instead of double. Difference will be insignificant for the player, and the performance impact way be interesting. Test it !

can you add some inline assembler? Targetting the i386 'fsincos' instruction is probably the fastest method :
Vector2 unit_vector ( Angle angle ) {
Vector2 r;
//now the normal processor detection
//and various platform specific vesions
# if defined (__i386__) && !defined (NO_ASM)
# if defined __GNUC__
# define ASM_SINCOS
asm ("fsincos" : "=t" (r.x), "=u" (r.y) : "0" (angle.radians()));
# elif defined _MSC_VER
# define ASM_SINCOS
double a = angle.radians();
__asm fld a
__asm fsincos
__asm fstp r.x
__asm fstp r.y
# endif
# endif
}
from here.
This has the added bonus of calculating both sin and cos in a single call.
EDIT : it's Java.
Are your rotations suitably self-contained that you can offload thousands at a time over JNI? Otherwise this hardware-specific approach is no good.

For small x (x<0.2 in radians) you can safely assume sin(x) = x.
The maximum deviation is 0.0013.

Related

eigen matrices multiplication optimizations, Householder precision

I develop code performing Schur decomposition.
I test it with eigen corresponding stuff.
I found out that my code gives result different form those of eigen.
Most of matrix elements of my and egein's output are the same to 2 or 4 decimal places.
However worst difference available is about 70%: for example my code gives certain matrix element equal to 0.3 and eigen 0.19.
I decided to look deeper into eigen sources and figured out that if I change following eigen code
void ::applyHouseholderOnTheLeft(....)
{
......
Map<typename internal::plain_row_type<PlainObject>::type> tmp(workspace,cols());
Block<Derived, EssentialPart::SizeAtCompileTime, Derived::ColsAtCompileTime> bottom(derived(), 1, 0, rows()-1, cols());
tmp.noalias() = essential.adjoint() * bottom;
tmp += this->row(0);
this->row(0) -= tau * tmp;
bottom.noalias() -= tau * essential * tmp;
}
to this one (same was done for applyHouseholderOnTheRight):
void ::applyHouseholderOnTheLeft(....)
{
......
Map<typename internal::plain_row_type<PlainObject>::type> tmp(workspace,cols());
Block<Derived, EssentialPart::SizeAtCompileTime, Derived::ColsAtCompileTime> bottom(derived(), 1, 0, rows()-1, cols());
tmp.noalias() = essential.adjoint() * bottom;
tmp += this->row(0);
tmp *= tau;
this->row(0) -= tmp;
bottom.noalias() -= essential * tmp;
}
i get eigen output equal to mine (within 7-6 decimal places) !!
Mathematically these two pieces of code are equivalent.
So the question is - why there is so big difference in outputs of equivalent code ?
And what result is actually true (0.3 or 0.19 :-) ) ?
Original test code:
Matrix<double, Dynamic, Dynamic, RowMajor> A(10,10);
A<<6.9 ,4.8 ,9.5 ,3.1 ,6.5 ,5.8 ,-0.9 ,-7.3 ,-8.1 ,3.0 ,0.1 ,9.9 ,-3.2 ,6.4 ,6.2 ,-7.0 ,5.5 ,-2.2 ,-4.0 ,3.7 ,-3.6 ,9.0 ,-1.4 ,-2.4 ,1.7 ,-6.1 ,-4.2 , -2.5 ,-5.6 ,-0.4 ,0.4 ,9.1 ,-2.1 ,-5.4 ,7.3 ,3.6 ,-1.7 ,-5.7 ,-8.0 ,8.8 ,-3.0 ,-0.5 ,1.1 ,10.0 ,8.0 ,0.8 ,1.0 ,7.5 ,3.5 ,-1.8 ,0.3 ,-0.6 ,-6.3 ,-4.5 , -1.1 ,1.8 ,0.6 ,9.6 ,9.2 ,9.7 ,-2.6 ,4.3 ,-3.4 ,0.0 ,-6.7 ,5.0 ,10.5 ,1.5 ,-7.8 ,-4.1 ,-5.3 ,-5.0 ,2.0 ,-4.4 ,-8.4 ,6.0 ,-9.4 ,-4.8 ,8.2 ,7.8 ,5.2 ,-9.5 , -3.9 ,0.2 ,6.8 ,5.7 ,-8.5 ,-1.9 ,-0.3 ,7.4 ,-8.7 ,7.2 ,1.3 ,6.3 ,-3.7 ,3.9 ,3.3 ,-6.0 ,-9.1 ,5.9;
RealSchur<Matrix<double, Dynamic, Dynamic, RowMajor>>schur(A);
Matrix<double, Dynamic, Dynamic, RowMajor> T = schur.matrixT();
// and ,for example, element T(row_0, col_2) has notable difference: 0.19 (first code), 0.3 (second code)
P.S. vectorization in eigen is disabled in my case (macros EIGEN_DONT_VECTORIZE is defined)

Explanation of the calc_delta_mine function

I am currently reading "Linux Kernel Development" by Robert Love, and I got a few questions about the CFS.
My question is how calc_delta_mine calculates :
delta_exec_weighted= (delta_exec * weight)/lw->weight
I guess it is done by two steps :
calculation the (delta_exec * 1024) :
if (likely(weight > (1UL << SCHED_LOAD_RESOLUTION)))
tmp = (u64)delta_exec * scale_load_down(weight);
else
tmp = (u64)delta_exec;
calculate the /lw->weight ( or * lw->inv_weight ) :
if (!lw->inv_weight) {
unsigned long w = scale_load_down(lw->weight);
if (BITS_PER_LONG > 32 && unlikely(w >= WMULT_CONST))
lw->inv_weight = 1;
else if (unlikely(!w))
lw->inv_weight = WMULT_CONST;
else
lw->inv_weight = WMULT_CONST / w;
}
/*
* Check whether we'd overflow the 64-bit multiplication:
*/
if (unlikely(tmp > WMULT_CONST))
tmp = SRR(SRR(tmp, WMULT_SHIFT/2) * lw->inv_weight,
WMULT_SHIFT/2);
else
tmp = SRR(tmp * lw->inv_weight, WMULT_SHIFT);
return (unsigned long)min(tmp, (u64)(unsigned long)LONG_MAX);
The SRR (Shift right and round) macro is defined via :
#define SRR(x, y) (((x) + (1UL << ((y) - 1))) >> (y))
And the other MACROS are defined :
#if BITS_PER_LONG == 32
# define WMULT_CONST (~0UL)
#else
# define WMULT_CONST (1UL << 32)
#endif
#define WMULT_SHIFT 32
Can someone please explain how exactly the SRR works and how does this check the 64-bit multiplication overflow?
And please explain the definition of the MACROS in this function((~0UL) ,(1UL << 32))?

The code you posted is basically doing calculations using 32.32 fixed-point arithmetic, where a single 64-bit quantity holds the integer part of the number in the high 32 bits, and the decimal part of the number in the low 32 bits (so, for example, 1.5 is 0x0000000180000000 in this system). WMULT_CONST is thus an approximation of 1.0 (using a value that can fit in a long for platform efficiency considerations), and so dividing WMULT_CONST by w computes 1/w as a 32.32 value.
Note that multiplying two 32.32 values together as integers produces a result that is 232 times too large; thus, WMULT_SHIFT (=32) is the right shift value needed to normalize the result of multiplying two 32.32 values together back down to 32.32.
The necessity of using this improved precision for scheduling purposes is explained in a comment in sched/sched.h:
/*
* Increase resolution of nice-level calculations for 64-bit architectures.
* The extra resolution improves shares distribution and load balancing of
* low-weight task groups (eg. nice +19 on an autogroup), deeper taskgroup
* hierarchies, especially on larger systems. This is not a user-visible change
* and does not change the user-interface for setting shares/weights.
*
* We increase resolution only if we have enough bits to allow this increased
* resolution (i.e. BITS_PER_LONG > 32). The costs for increasing resolution
* when BITS_PER_LONG <= 32 are pretty high and the returns do not justify the
* increased costs.
*/
As for SRR, mathematically, it computes the rounded result of x / 2y.
To round the result of a division x/q you can calculate x + q/2 floor-divided by q; this is what SRR does by calculating x + 2y-1 floor-divided by 2y.

Memory and excecution speed in Matlab

I am trying to create random lines and select some of them, which are really rare. My code is rather simple, but to get something that I can use I need to create very large vectors(i.e.: <100000000 x 1, tracks variable in my code). Is there any way to be able to creater larger vectors and to reduce the time needed for all those calculations?
My code is
%Initial line values
tracks=input('Give me the number of muon tracks: ');
width=1e-4;
height=2e-4;
Ystart=15.*ones(tracks,1);
Xstart=-40+80.*rand(tracks,1);
%Xend=-40+80.*rand(tracks,1);
Xend=laprnd(tracks,1,Xstart,15);
X=[Xstart';Xend'];
Y=[Ystart';zeros(1,tracks)];
b=(Ystart.*Xend)./(Xend-Xstart);
hot=0;
cold=0;
for i=1:tracks
if ((Xend(i,1)<width/2 && Xend(i,1)>-width/2)||(b(i,1)<height && b(i,1)>0))
plot(X(:, i),Y(:, i),'r');%the chosen ones!
hold all
hot=hot+1;
else
%plot(X(:, i),Y(:, i),'b');%the rest of them
%hold all
cold=cold+1;
end
end
I am also using and calling a Laplace distribution generator made my Elvis Chen which can be found here
function y = laprnd(m, n, mu, sigma)
%LAPRND generate i.i.d. laplacian random number drawn from laplacian distribution
% with mean mu and standard deviation sigma.
% mu : mean
% sigma : standard deviation
% [m, n] : the dimension of y.
% Default mu = 0, sigma = 1.
% For more information, refer to
% http://en.wikipedia.org./wiki/Laplace_distribution
% Author : Elvis Chen (bee33#sjtu.edu.cn)
% Date : 01/19/07
%Check inputs
if nargin < 2
error('At least two inputs are required');
end
if nargin == 2
mu = 0; sigma = 1;
end
if nargin == 3
sigma = 1;
end
% Generate Laplacian noise
u = rand(m, n)-0.5;
b = sigma / sqrt(2);
y = mu - b * sign(u).* log(1- 2* abs(u));
The result plot is

As you indicate, your problem is two-fold. On the one hand, you have memory issues because you need to do so many trials. On the other hand, you have performance issues, because you have to process all those trials.
Solutions to each issue often have a negative impact on the other issue. IMHO, the best approach would be to find a compromise.
More trials are only possible of you get rid of those gargantuan arrays that are required for vectorization, and use a different strategy to do the loop. I will give priority to the possibility of using more trials, possibly at the cost of optimal performance.
When I execute your code as-is in the Matlab profiler, it immediately shows that the initial memory allocation for all your variables takes a lot of time. It also shows that the plot and hold all commands are the most time-consuming lines of them all. Some more trial-and-error shows that there is a disappointingly low maximum value for the trials you can do before OUT OF MEMORY errors start appearing.
The loop can be accelerated tremendously if you know a few things about its limitations in Matlab. In older versions of Matlab, it used to be true that loops should be avoided completely in favor of 'vectorized' code. In recent versions (I believe R2008a and up), the Mathworks introduced a piece of technology called the JIT accelerator (Just-in-Time compiler) which translates M-code into machine language on the fly during execution. Simply put, the JIT accelerator allows your code to bypass Matlab's interpreter and talk much more directly with the underlying hardware, which can save a lot of time.
The advice you'll hear a lot that loops should be avoided in Matlab, is no longer generally true. While vectorization still has its value, any procedure of sizable complexity that is implemented using only vectorized code is often illegible, hard to understand, hard to change and hard to upkeep. An implementation of the same procedure that uses loops, often has none of these drawbacks, and moreover, it will quite often be faster and require less memory.
Unfortunately, the JIT accelerator has a few nasty (and IMHO, unnecessary) limitations that you'll have to learn about.
One such thing is plot; it's generally a better idea to let a loop do nothing other than collect and manipulate data, and delay any plotting commands etc. until after the loop.
Another such thing is hold; the hold function is not a Matlab built-in function, meaning, it is implemented in M-language. Matlab's JIT accelerator is not able to accelerate non-builtin functions when used in a loop, meaning, your entire loop will run at Matlab's interpretation speed, rather than machine-language speed! Therefore, also delay this command until after the loop :)
Now, in case you're wondering, this last step can make a HUGE difference -- I know of one case where copy-pasting a function body into the upper-level loop caused a 1200x performance improvement. Days of execution time had been reduced to minutes!).
There is actually another minor issue in your loop (which is really small, and rather inconvenient, I will immediately agree with) -- the name of the loop variable should not be i. The name i is the name of the imaginary unit in Matlab, and the name resolution will also unnecessarily consume time on each iteration. It's small, but non-negligible.
Now, considering all this, I've come to the following implementation:
function [hot, cold, h] = MuonTracks(tracks)
% NOTE: no variables larger than 1x1 are initialized
width = 1e-4;
height = 2e-4;
% constant used for Laplacian noise distribution
bL = 15 / sqrt(2);
% Loop through all tracks
X = [];
hot = 0;
ii = 0;
while ii <= tracks
ii = ii + 1;
% Note that I've inlined (== copy-pasted) the original laprnd()
% function call. This was necessary to work around limitations
% in loops in Matlab, and prevent the nececessity of those HUGE
% variables.
%
% Of course, you can still easily generalize all of this:
% the new data
u = rand-0.5;
Ystart = 15;
Xstart = 800*rand-400;
Xend = Xstart - bL*sign(u)*log(1-2*abs(u));
b = (Ystart*Xend)/(Xend-Xstart);
% the test
if ((b < height && b > 0)) ||...
(Xend < width/2 && Xend > -width/2)
hot = hot+1;
% growing an array is perfectly fine when the chances of it
% happening are so slim
X = [X [Xstart; Xend]]; %#ok
end
end
% This is trivial to do here, and prevents an 'else' in the loop
cold = tracks - hot;
% Now plot the chosen ones
h = figure;
hold all
Y = repmat([15;0], 1, size(X,2));
plot(X, Y, 'r');
end
With this implementation, I can do this:
>> tic, MuonTracks(1e8); toc
Elapsed time is 24.738725 seconds.
with a completely negligible memory footprint.
The profiler now also shows a nice and even distribution of effort along the code; no lines that really stand out because of their memory use or performance.
It's possibly not the fastest possible implementation (if anyone sees obvious improvements, please, feel free to edit them in). But, if you're willing to wait, you'll be able to do MuonTracks(1e23) (or higher :)
I've also done an implementation in C, which can be compiled into a Matlab MEX file:
/* DoMuonCounting.c */
#include <math.h>
#include <matrix.h>
#include <mex.h>
#include <time.h>
#include <stdlib.h>
void CountMuons(
unsigned long long tracks,
unsigned long long *hot, unsigned long long *cold, double *Xout);
/* simple little helper functions */
double sign(double x) { return (x>0)-(x<0); }
double rand_double() { return (double)rand()/(double)RAND_MAX; }
/* the gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int
dims[] = {1,1};
const mxArray
/* Output arguments */
*hot_out = plhs[0] = mxCreateNumericArray(2,dims, mxUINT64_CLASS,0),
*cold_out = plhs[1] = mxCreateNumericArray(2,dims, mxUINT64_CLASS,0),
*X_out = plhs[2] = mxCreateDoubleMatrix(2,10000, mxREAL);
const unsigned long long
tracks = (const unsigned long long)mxGetPr(prhs[0])[0];
unsigned long long
*hot = (unsigned long long*)mxGetPr(hot_out),
*cold = (unsigned long long*)mxGetPr(cold_out);
double
*Xout = mxGetPr(X_out);
/* call the actual function, and return */
CountMuons(tracks, hot,cold, Xout);
}
// The actual muon counting
void CountMuons(
unsigned long long tracks,
unsigned long long *hot, unsigned long long *cold, double *Xout)
{
const double
width = 1.0e-4,
height = 2.0e-4,
bL = 15.0/sqrt(2.0),
Ystart = 15.0;
double
Xstart,
Xend,
u,
b;
unsigned long long
i = 0ul;
*hot = 0ul;
*cold = tracks;
/* seed the RNG */
srand((unsigned)time(NULL));
/* aaaand start! */
while (i++ < tracks)
{
u = rand_double() - 0.5;
Xstart = 800.0*rand_double() - 400.0;
Xend = Xstart - bL*sign(u)*log(1.0-2.0*fabs(u));
b = (Ystart*Xend)/(Xend-Xstart);
if ((b < height && b > 0.0) || (Xend < width/2.0 && Xend > -width/2.0))
{
Xout[0 + *hot*2] = Xstart;
Xout[1 + *hot*2] = Xend;
++(*hot);
--(*cold);
}
}
}
compile in Matlab with
mex DoMuonCounting.c
(after having run mex setup :) and then use it in conjunction with a small M-wrapper like this:
function [hot,cold, h] = MuonTrack2(tracks)
% call the MEX function
[hot,cold, Xtmp] = DoMuonCounting(tracks);
% process outputs, and generate plots
hot = uint32(hot); % circumvents limitations in 32-bit matlab
X = Xtmp(:,1:hot);
clear Xtmp
h = NaN;
if ~isempty(X)
h = figure;
hold all
Y = repmat([15;0], 1, hot);
plot(X, Y, 'r');
end
end
which allows me to do
>> tic, MuonTrack2(1e8); toc
Elapsed time is 14.496355 seconds.
Note that the memory footprint of the MEX version is slightly larger, but I think that's nothing to worry about.
The only flaw I see is the fixed maximum number of Muon counts (hard-coded as 10000 as the initial array size of Xout; needed because there are no dynamically growing arrays in standard C)...if you're worried this limit could be broken, simply increase it, change it to be equal to a fraction of tracks, or do some smarter (but more painful) dynamic array-growing tricks.

In Matlab, it is sometimes faster to vectorize rather than use a for loop. For example, this expression:
(Xend(i,1) < width/2 && Xend(i,1) > -width/2) || (b(i,1) < height && b(i,1) > 0)
which is defined for each value of i, can be rewritten in a vectorised manner like this:
isChosen = (Xend(:,1) < width/2 & Xend(:,1) > -width/2) | (b(:,1) < height & b(:,1)>0)
Expessions like Xend(:,1) will give you a column vector, so Xend(:,1) < width/2 will give you a column vector of boolean values. Note then that I have used & rather than && - this is because & performs an element-wise logical AND, unlike && which only works on scalar values. In this way you can build the entire expression, such that the variable isChosen holds a column vector of boolean values, one for each row of your Xend/b vectors.
Getting counts is now as simple as this:
hot = sum(isChosen);
since true is represented by 1. And:
cold = sum(~isChosen);
Finally, you can get the data points by using the boolean vector to select rows:
plot(X(:, isChosen),Y(:, isChosen),'r'); % Plot chosen values
hold all;
plot(X(:, ~isChosen),Y(:, ~isChosen),'b'); % Plot unchosen values
EDIT: The code should look like this:
isChosen = (Xend(:,1) < width/2 & Xend(:,1) > -width/2) | (b(:,1) < height & b(:,1)>0);
hot = sum(isChosen);
cold = sum(~isChosen);
plot(X(:, isChosen),Y(:, isChosen),'r'); % Plot chosen values

Approximating inverse trigonometric functions

I have to implement asin, acos and atan in environment where I have only following math tools:
sine
cosine
elementary fixed point arithmetic (floating point numbers are not available)
I also already have reasonably good square root function.
Can I use those to implement reasonably efficient inverse trigonometric functions?
I don't need too big precision (the floating point numbers have very limited precision anyways), basic approximation will do.
I'm already half decided to go with table lookup, but I would like to know if there is some neater option (that doesn't need several hundred lines of code just to implement basic math).
EDIT:
To clear things up: I need to run the function hundreds of times per frame at 35 frames per second.

In a fixed-point environment (S15.16) I successfully used the CORDIC algorithm (see Wikipedia for a general description) to compute atan2(y,x), then derived asin() and acos() from that using well-known functional identities that involve the square root:
asin(x) = atan2 (x, sqrt ((1.0 + x) * (1.0 - x)))
acos(x) = atan2 (sqrt ((1.0 + x) * (1.0 - x)), x)
It turns out that finding a useful description of the CORDIC iteration for atan2() on the double is harder than I thought. The following website appears to contain a sufficiently detailed description, and also discusses two alternative approaches, polynomial approximation and lookup tables:
http://ch.mathworks.com/examples/matlab-fixed-point-designer/615-calculate-fixed-point-arctangent

Do you need a large precision for arcsin(x) function? If no you may calculate arcsin in N nodes, and keep values in memory. I suggest using line aproximation. if x = A*x_(N) + (1-A)*x_(N+1) then x = A*arcsin(x_(N)) + (1-A)*arcsin(x_(N+1)) where arcsin(x_(N)) is known.

you might want to use approximation: use an infinite series until the solution is close enough for you.
for example:
arcsin(z) = Sigma((2n!)/((2^2n)*(n!)^2)*((z^(2n+1))/(2n+1))) where n in [0,infinity)

http://en.wikipedia.org/wiki/Inverse_trigonometric_functions#Expression_as_definite_integrals
You could do that integration numerically with your square root function, approximating with an infinite series:

Submitting here my answer from this other similar question.
nVidia has some great resources I've used for my own uses, few examples: acos asin atan2 etc etc...
These algorithms produce precise enough results. Here's a straight up Python example with their code copy pasted in:
import math
def nVidia_acos(x):
negate = float(x<0)
x=abs(x)
ret = -0.0187293
ret = ret * x
ret = ret + 0.0742610
ret = ret * x
ret = ret - 0.2121144
ret = ret * x
ret = ret + 1.5707288
ret = ret * math.sqrt(1.0-x)
ret = ret - 2 * negate * ret
return negate * 3.14159265358979 + ret
And here are the results for comparison:
nVidia_acos(0.5) result: 1.0471513828611643
math.acos(0.5) result: 1.0471975511965976
That's pretty close! Multiply by 57.29577951 to get results in degrees, which is also from their "degrees" formula.

It should be easy to addapt the following code to fixed point. It employs a rational approximation to calculate the arctangent normalized to the [0 1) interval (you can multiply it by Pi/2 to get the real arctangent). Then, you can use well known identities to get the arcsin/arccos from the arctangent.
normalized_atan(x) ~ (b x + x^2) / (1 + 2 b x + x^2)
where b = 0.596227
The maximum error is 0.1620º
#include <stdint.h>
#include <math.h>
// Approximates atan(x) normalized to the [-1,1] range
// with a maximum error of 0.1620 degrees.
float norm_atan( float x )
{
static const uint32_t sign_mask = 0x80000000;
static const float b = 0.596227f;
// Extract the sign bit
uint32_t ux_s = sign_mask & (uint32_t &)x;
// Calculate the arctangent in the first quadrant
float bx_a = ::fabs( b * x );
float num = bx_a + x * x;
float atan_1q = num / ( 1.f + bx_a + num );
// Restore the sign bit
uint32_t atan_2q = ux_s | (uint32_t &)atan_1q;
return (float &)atan_2q;
}
// Approximates atan2(y, x) normalized to the [0,4) range
// with a maximum error of 0.1620 degrees
float norm_atan2( float y, float x )
{
static const uint32_t sign_mask = 0x80000000;
static const float b = 0.596227f;
// Extract the sign bits
uint32_t ux_s = sign_mask & (uint32_t &)x;
uint32_t uy_s = sign_mask & (uint32_t &)y;
// Determine the quadrant offset
float q = (float)( ( ~ux_s & uy_s ) >> 29 | ux_s >> 30 );
// Calculate the arctangent in the first quadrant
float bxy_a = ::fabs( b * x * y );
float num = bxy_a + y * y;
float atan_1q = num / ( x * x + bxy_a + num );
// Translate it to the proper quadrant
uint32_t uatan_2q = (ux_s ^ uy_s) | (uint32_t &)atan_1q;
return q + (float &)uatan_2q;
}
In case you need more precision, there is a 3rd order rational function:
normalized_atan(x) ~ ( c x + x^2 + x^3) / ( 1 + (c + 1) x + (c + 1) x^2 + x^3)
where c = (1 + sqrt(17)) / 8
which has a maximum approximation error of 0.00811º

Maybe some kind of intelligent brute force like newton rapson.
So for solving asin() you go with steepest descent on sin()

Use a polynomial approximation. Least-squares fit is easiest (Microsoft Excel has it) and Chebyshev approximation is more accurate.
This question has been covered before: How do Trigonometric functions work?

Only continous functions are approximable by polynomials. And arcsin(x) is discontinous in point x=1.same arccos(x).But a range reduction to interval 1,sqrt(1/2) in that case avoid this situation. We have arcsin(x)=pi/2- arccos(x),arccos(x)=pi/2-arcsin(x).you can use matlab for minimax approximation.Aproximate only in range [0,sqrt(1/2)](if angle for that arcsin is request is bigger that sqrt(1/2) find cos(x).arctangent function only for x<1.arctan(x)=pi/2-arctan(1/x).

How do you calculate the average of a set of circular data?

I want to calculate the average of a set of circular data. For example, I might have several samples from the reading of a compass. The problem of course is how to deal with the wraparound. The same algorithm might be useful for a clockface.
The actual question is more complicated - what do statistics mean on a sphere or in an algebraic space which "wraps around", e.g. the additive group mod n. The answer may not be unique, e.g. the average of 359 degrees and 1 degree could be 0 degrees or 180, but statistically 0 looks better.
This is a real programming problem for me and I'm trying to make it not look like just a Math problem.

Compute unit vectors from the angles and take the angle of their average.

This question is examined in detail in the book:
"Statistics On Spheres", Geoffrey S. Watson, University of Arkansas Lecture
Notes in the Mathematical Sciences, 1983 John Wiley & Sons, Inc. as mentioned at http://catless.ncl.ac.uk/Risks/7.44.html#subj4 by Bruce Karsh.
A good way to estimate an average angle, A, from a set of angle measurements
a[i] 0<=i
sum_i_from_1_to_N sin(a[i])
a = arctangent ---------------------------
sum_i_from_1_to_N cos(a[i])
The method given by starblue is computationally equivalent, but his reasons are clearer and probably programmatically more efficient, and also work well in the zero case, so kudos to him.
The subject is now explored in more detail on Wikipedia, and with other uses, like fractional parts.

I see the problem - for example, if you have a 45' angle and a 315' angle, the "natural" average would be 180', but the value you want is actually 0'.
I think Starblue is onto something. Just calculate the (x, y) cartesian coordinates for each angle, and add those resulting vectors together. The angular offset of the final vector should be your required result.
x = y = 0
foreach angle {
x += cos(angle)
y += sin(angle)
}
average_angle = atan2(y, x)
I'm ignoring for now that a compass heading starts at north, and goes clockwise, whereas "normal" cartesian coordinates start with zero along the X axis, and then go anti-clockwise. The maths should work out the same way regardless.

FOR THE SPECIAL CASE OF TWO ANGLES:
The answer ( (a + b) mod 360 ) / 2 is WRONG. For angles 350 and 2, the closest point is 356, not 176.
The unit vector and trig solutions may be too expensive.
What I've got from a little tinkering is:
diff = ( ( a - b + 180 + 360 ) mod 360 ) - 180
angle = (360 + b + ( diff / 2 ) ) mod 360
0, 180 -> 90 (two answers for this: this equation takes the clockwise answer from a)
180, 0 -> 270 (see above)
180, 1 -> 90.5
1, 180 -> 90.5
20, 350 -> 5
350, 20 -> 5 (all following examples reverse properly too)
10, 20 -> 15
350, 2 -> 356
359, 0 -> 359.5
180, 180 -> 180

ackb is right that these vector based solutions cannot be considered true averages of angles, they are only an average of the unit vector counterparts. However, ackb's suggested solution does not appear to mathematically sound.
The following is a solution that is mathematically derived from the goal of minimising (angle[i] - avgAngle)^2 (where the difference is corrected if necessary), which makes it a true arithmetic mean of the angles.
First, we need to look at exactly which cases the difference between angles is different to the difference between their normal number counterparts. Consider angles x and y, if y >= x - 180 and y <= x + 180, then we can use the difference (x-y) directly. Otherwise, if the first condition is not met then we must use (y+360) in the calculation instead of y. Corresponding, if the second condition is not met then we must use (y-360) instead of y. Since the equation of the curve we are minimising only changes at the points where these inequalities change from true to false or vice versa, we can separate the full [0,360) range into a set of segments, separated by these points. Then, we only need to find the minimum of each of these segments, and then the minimum of each segment's minimum, which is the average.
Here's an image demonstrating where the problems occur in calculating angle differences. If x lies in the gray area then there will be a problem.
To minimise a variable, depending on the curve, we can take the derivative of what we want to minimise and then we find the turning point (which is where the derivative = 0).
Here we will apply the idea of minimise the squared difference to derive the common arithmetic mean formula: sum(a[i])/n. The curve y = sum((a[i]-x)^2) can be minimised in this way:
y = sum((a[i]-x)^2)
= sum(a[i]^2 - 2*a[i]*x + x^2)
= sum(a[i]^2) - 2*x*sum(a[i]) + n*x^2
dy\dx = -2*sum(a[i]) + 2*n*x
for dy/dx = 0:
-2*sum(a[i]) + 2*n*x = 0
-> n*x = sum(a[i])
-> x = sum(a[i])/n
Now applying it to curves with our adjusted differences:
b = subset of a where the correct (angular) difference a[i]-x
c = subset of a where the correct (angular) difference (a[i]-360)-x
cn = size of c
d = subset of a where the correct (angular) difference (a[i]+360)-x
dn = size of d
y = sum((b[i]-x)^2) + sum(((c[i]-360)-b)^2) + sum(((d[i]+360)-c)^2)
= sum(b[i]^2 - 2*b[i]*x + x^2)
+ sum((c[i]-360)^2 - 2*(c[i]-360)*x + x^2)
+ sum((d[i]+360)^2 - 2*(d[i]+360)*x + x^2)
= sum(b[i]^2) - 2*x*sum(b[i])
+ sum((c[i]-360)^2) - 2*x*(sum(c[i]) - 360*cn)
+ sum((d[i]+360)^2) - 2*x*(sum(d[i]) + 360*dn)
+ n*x^2
= sum(b[i]^2) + sum((c[i]-360)^2) + sum((d[i]+360)^2)
- 2*x*(sum(b[i]) + sum(c[i]) + sum(d[i]))
- 2*x*(360*dn - 360*cn)
+ n*x^2
= sum(b[i]^2) + sum((c[i]-360)^2) + sum((d[i]+360)^2)
- 2*x*sum(x[i])
- 2*x*360*(dn - cn)
+ n*x^2
dy/dx = 2*n*x - 2*sum(x[i]) - 2*360*(dn - cn)
for dy/dx = 0:
2*n*x - 2*sum(x[i]) - 2*360*(dn - cn) = 0
n*x = sum(x[i]) + 360*(dn - cn)
x = (sum(x[i]) + 360*(dn - cn))/n
This alone is not quite enough to get the minimum, while it works for normal values, that has an unbounded set, so the result will definitely lie within set's range and is therefore valid. We need the minimum within a range (defined by the segment). If the minimum is less than our segment's lower bound then the minimum of that segment must be at the lower bound (because quadratic curves only have 1 turning point) and if the minimum is greater than our segment's upper bound then the segment's minimum is at the upper bound. After we have the minimum for each segment, we simply find the one that has the lowest value for what we're minimising (sum((b[i]-x)^2) + sum(((c[i]-360)-b)^2) + sum(((d[i]+360)-c)^2)).
Here is an image to the curve, which shows how it changes at the points where x=(a[i]+180)%360. The data set is in question is {65,92,230,320,250}.
Here is an implementation of the algorithm in Java, including some optimisations, its complexity is O(nlogn). It can be reduced to O(n) if you replace the comparison based sort with a non comparison based sort, such as radix sort.
static double varnc(double _mean, int _n, double _sumX, double _sumSqrX)
{
return _mean*(_n*_mean - 2*_sumX) + _sumSqrX;
}
//with lower correction
static double varlc(double _mean, int _n, double _sumX, double _sumSqrX, int _nc, double _sumC)
{
return _mean*(_n*_mean - 2*_sumX) + _sumSqrX
+ 2*360*_sumC + _nc*(-2*360*_mean + 360*360);
}
//with upper correction
static double varuc(double _mean, int _n, double _sumX, double _sumSqrX, int _nc, double _sumC)
{
return _mean*(_n*_mean - 2*_sumX) + _sumSqrX
- 2*360*_sumC + _nc*(2*360*_mean + 360*360);
}
static double[] averageAngles(double[] _angles)
{
double sumAngles;
double sumSqrAngles;
double[] lowerAngles;
double[] upperAngles;
{
List<Double> lowerAngles_ = new LinkedList<Double>();
List<Double> upperAngles_ = new LinkedList<Double>();
sumAngles = 0;
sumSqrAngles = 0;
for(double angle : _angles)
{
sumAngles += angle;
sumSqrAngles += angle*angle;
if(angle < 180)
lowerAngles_.add(angle);
else if(angle > 180)
upperAngles_.add(angle);
}
Collections.sort(lowerAngles_);
Collections.sort(upperAngles_,Collections.reverseOrder());
lowerAngles = new double[lowerAngles_.size()];
Iterator<Double> lowerAnglesIter = lowerAngles_.iterator();
for(int i = 0; i < lowerAngles_.size(); i++)
lowerAngles[i] = lowerAnglesIter.next();
upperAngles = new double[upperAngles_.size()];
Iterator<Double> upperAnglesIter = upperAngles_.iterator();
for(int i = 0; i < upperAngles_.size(); i++)
upperAngles[i] = upperAnglesIter.next();
}
List<Double> averageAngles = new LinkedList<Double>();
averageAngles.add(180d);
double variance = varnc(180,_angles.length,sumAngles,sumSqrAngles);
double lowerBound = 180;
double sumLC = 0;
for(int i = 0; i < lowerAngles.length; i++)
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles + 360*i)/_angles.length;
//minimum is outside segment range (therefore not directly relevant)
//since it is greater than lowerAngles[i], the minimum for the segment
//must lie on the boundary lowerAngles[i]
if(testAverageAngle > lowerAngles[i]+180)
testAverageAngle = lowerAngles[i];
if(testAverageAngle > lowerBound)
{
double testVariance = varlc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,i,sumLC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
lowerBound = lowerAngles[i];
sumLC += lowerAngles[i];
}
//Test last segment
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles + 360*lowerAngles.length)/_angles.length;
//minimum is inside segment range
//we will test average 0 (360) later
if(testAverageAngle < 360 && testAverageAngle > lowerBound)
{
double testVariance = varlc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,lowerAngles.length,sumLC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
}
double upperBound = 180;
double sumUC = 0;
for(int i = 0; i < upperAngles.length; i++)
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles - 360*i)/_angles.length;
//minimum is outside segment range (therefore not directly relevant)
//since it is greater than lowerAngles[i], the minimum for the segment
//must lie on the boundary lowerAngles[i]
if(testAverageAngle < upperAngles[i]-180)
testAverageAngle = upperAngles[i];
if(testAverageAngle < upperBound)
{
double testVariance = varuc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,i,sumUC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
upperBound = upperAngles[i];
sumUC += upperBound;
}
//Test last segment
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles - 360*upperAngles.length)/_angles.length;
//minimum is inside segment range
//we test average 0 (360) now
if(testAverageAngle < 0)
testAverageAngle = 0;
if(testAverageAngle < upperBound)
{
double testVariance = varuc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,upperAngles.length,sumUC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
}
double[] averageAngles_ = new double[averageAngles.size()];
Iterator<Double> averageAnglesIter = averageAngles.iterator();
for(int i = 0; i < averageAngles_.length; i++)
averageAngles_[i] = averageAnglesIter.next();
return averageAngles_;
}
The arithmetic mean of a set of angles may not agree with your intuitive idea of what the average should be. For example, the arithmetic mean of the set {179,179,0,181,181} is 216 (and 144). The answer you immediately think of is probably 180, however it is well known that the arithmetic mean is heavily affected by edge values. You should also remember that angles are not vectors, as appealing as that may seem when dealing with angles sometimes.
This algorithm does of course also apply to all quantities that obey modular arithmetic (with minimal adjustment), such as the time of day.
I would also like to stress that even though this is a true average of angles, unlike the vector solutions, that does not necessarily mean it is the solution you should be using, the average of the corresponding unit vectors may well be the value you actually should to be using.

You have to define average more accurately. For the specific case of two angles, I can think of two different scenarios:
The "true" average, i.e. (a + b) / 2 % 360.
The angle that points "between" the two others while staying in the same semicircle, e.g. for 355 and 5, this would be 0, not 180. To do this, you need to check if the difference between the two angles is larger than 180 or not. If so, increment the smaller angle by 360 before using the above formula.
I don't see how the second alternative can be generalized for the case of more than two angles, though.

I'd like to share an method I used with a microcontroller which did not have floating point or trigonometry capabilities. I still needed to "average" 10 raw bearing readings in order to smooth out variations.
Check whether the first bearing is the range 270-360 or 0-90 degrees (northern two quadrants)
If it is, rotate this and all subsequent readings by 180 degrees, keeping all values in the range 0 <= bearing < 360. Otherwise take the readings as they come.
Once 10 readings have been taken calculate the numerical average assuming that there has been no wraparound
If the 180 degree rotation had been in effect then rotate the calculated average by 180 degrees to get back to a "true" bearing.
It's not ideal; it can break. I got away with it in this case because the device only rotates very slowly. I'll put it out there in case anyone else finds themselves working under similar restrictions.

Like all averages, the answer depends upon the choice of metric. For a given metric M, the average of some angles a_k in [-pi,pi] for k in [1,N] is that angle a_M which minimizes the sum of squared distances d^2_M(a_M,a_k). For a weighted mean, one simply includes in the sum the weights w_k (such that sum_k w_k = 1). That is,
a_M = arg min_x sum_k w_k d^2_M(x,a_k)
Two common choices of metric are the Frobenius and the Riemann metrics. For the Frobenius metric, a direct formula exists that corresponds to the usual notion of average bearing in circular statistics. See "Means and Averaging in the Group of Rotations", Maher Moakher, SIAM Journal on Matrix Analysis and Applications, Volume 24, Issue 1, 2002, for details.
http://link.aip.org/link/?SJMAEL/24/1/1
Here's a function for GNU Octave 3.2.4 that does the computation:
function ma=meanangleoct(a,w,hp,ntype)
% ma=meanangleoct(a,w,hp,ntype) returns the average of angles a
% given weights w and half-period hp using norm type ntype
% Ref: "Means and Averaging in the Group of Rotations",
% Maher Moakher, SIAM Journal on Matrix Analysis and Applications,
% Volume 24, Issue 1, 2002.
if (nargin<1) | (nargin>4), help meanangleoct, return, end
if isempty(a), error('no measurement angles'), end
la=length(a); sa=size(a);
if prod(sa)~=la, error('a must be a vector'); end
if (nargin<4) || isempty(ntype), ntype='F'; end
if ~sum(ntype==['F' 'R']), error('ntype must be F or R'), end
if (nargin<3) || isempty(hp), hp=pi; end
if (nargin<2) || isempty(w), w=1/la+0*a; end
lw=length(w); sw=size(w);
if prod(sw)~=lw, error('w must be a vector'); end
if lw~=la, error('length of w must equal length of a'), end
if sum(w)~=1, warning('resumming weights to unity'), w=w/sum(w); end
a=a(:); % make column vector
w=w(:); % make column vector
a=mod(a+hp,2*hp)-hp; % reduce to central period
a=a/hp*pi; % scale to half period pi
z=exp(i*a); % U(1) elements
% % NOTA BENE:
% % fminbnd can get hung up near the boundaries.
% % If that happens, shift the input angles a
% % forward by one half period, then shift the
% % resulting mean ma back by one half period.
% X=fminbnd(#meritfcn,-pi,pi,[],z,w,ntype);
% % seems to work better
x0=imag(log(sum(w.*z)));
X=fminbnd(#meritfcn,x0-pi,x0+pi,[],z,w,ntype);
% X=real(X); % truncate some roundoff
X=mod(X+pi,2*pi)-pi; % reduce to central period
ma=X*hp/pi; % scale to half period hp
return
%%%%%%
function d2=meritfcn(x,z,w,ntype)
x=exp(i*x);
if ntype=='F'
y=x-z;
else % ntype=='R'
y=log(x'*z);
end
d2=y'*diag(w)*y;
return
%%%%%%
% % test script
% %
% % NOTA BENE: meanangleoct(a,[],[],'R') will equal mean(a)
% % when all abs(a-b) < pi/2 for some value b
% %
% na=3, a=sort(mod(randn(1,na)+1,2)-1)*pi;
% da=diff([a a(1)+2*pi]); [mda,ndx]=min(da);
% a=circshift(a,[0 2-ndx]) % so that diff(a(2:3)) is smallest
% A=exp(i*a), B1=expm(a(1)*[0 -1; 1 0]),
% B2=expm(a(2)*[0 -1; 1 0]), B3=expm(a(3)*[0 -1; 1 0]),
% masimpl=[angle(mean(exp(i*a))) mean(a)]
% Bsum=B1+B2+B3; BmeanF=Bsum/sqrt(det(Bsum));
% % this expression for BmeanR should be correct for ordering of a above
% BmeanR=B1*(B1'*B2*(B2'*B3)^(1/2))^(2/3);
% mamtrx=real([[0 1]*logm(BmeanF)*[1 0]' [0 1]*logm(BmeanR)*[1 0]'])
% manorm=[meanangleoct(a,[],[],'F') meanangleoct(a,[],[],'R')]
% polar(a,1+0*a,'b*'), axis square, hold on
% polar(manorm(1),1,'rs'), polar(manorm(2),1,'gd'), hold off
% Meanangleoct Version 1.0
% Copyright (C) 2011 Alphawave Research, robjohnson#alphawaveresearch.com
% Released under GNU GPLv3 -- see file COPYING for more info.
%
% Meanangle is free software: you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation, either version 3 of the License, or (at
% your option) any later version.
%
% Meanangle is distributed in the hope that it will be useful, but
% WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
% General Public License for more details.
%
% You should have received a copy of the GNU General Public License
% along with this program. If not, see `http://www.gnu.org/licenses/'.

In python, with angles between [-180, 180)
def add_angles(a, b):
return (a + b + 180) % 360 - 180
def average_angles(a, b):
return add_angles(a, add_angles(-a, b)/2)
Details:
For the average of two angles there are two averages 180° apart, but we may want the closer average.
Visually, the average of the blue (b) and green (a) the yields the teal point:
Angles 'wrap around' (e.g. 355 + 10 = 5), but standard arithmetic will ignore this branch point.
However if angle b is opposite to the branch point, then (b + g)/2 gives the closest average: the teal point.
For any two angles, we can rotate the problem so one of the angles is opposite to the branch point, perform standard averaging, then rotate back.

Here is the full solution:
(the input is an array of bearing in degrees (0-360)
public static int getAvarageBearing(int[] arr)
{
double sunSin = 0;
double sunCos = 0;
int counter = 0;
for (double bearing : arr)
{
bearing *= Math.PI/180;
sunSin += Math.sin(bearing);
sunCos += Math.cos(bearing);
counter++;
}
int avBearing = INVALID_ANGLE_VALUE;
if (counter > 0)
{
double bearingInRad = Math.atan2(sunSin/counter, sunCos/counter);
avBearing = (int) (bearingInRad*180f/Math.PI);
if (avBearing<0)
avBearing += 360;
}
return avBearing;
}

In english:
Make a second data set with all angles shifted by 180.
Take the variance of both data sets.
Take the average of the data set with the smallest variance.
If this average is from the shifted set then shift the answer again by 180.
In python:
A #numpy NX1 array of angles
if np.var(A) < np.var((A-180)%360):
average = np.average(A)
else:
average = (np.average((A-180)%360)+180)%360

If anyone is looking for a JavaScript solution to this, I've translated the example given in the wikipedia page Mean of circular quantities (which was also referred to in Nick's answer) into JavaScript/NodeJS code, with help from the mathjs library.
If your angles are in degrees:
const maths = require('mathjs');
getAverageDegrees = (array) => {
let arrayLength = array.length;
let sinTotal = 0;
let cosTotal = 0;
for (let i = 0; i < arrayLength; i++) {
sinTotal += maths.sin(array[i] * (maths.pi / 180));
cosTotal += maths.cos(array[i] * (maths.pi / 180));
}
let averageDirection = maths.atan(sinTotal / cosTotal) * (180 / maths.pi);
if (cosTotal < 0) {
averageDirection += 180;
} else if (sinTotal < 0) {
averageDirection += 360;
}
return averageDirection;
}
This solution worked really well for me in order to find the average direction from a set of compass directions. I've tested this on a large range of directional data (0-360 degrees) and it seems very robust.
Alternatively, if your angles are in radians:
const maths = require('mathjs');
getAverageRadians = (array) => {
let arrayLength = array.length;
let sinTotal = 0;
let cosTotal = 0;
for (let i = 0; i < arrayLength; i++) {
sinTotal += maths.sin(array[i]);
cosTotal += maths.cos(array[i]);
}
let averageDirection = maths.atan(sinTotal / cosTotal);
if (cosTotal < 0) {
averageDirection += 180;
} else if (sinTotal < 0) {
averageDirection += 360;
}
return averageDirection;
}
Hopefully these solutions are helpful to someone facing a similar programming challenge to me.

I would go the vector way using complex numbers. My example is in Python, which has built-in complex numbers:
import cmath # complex math
def average_angle(list_of_angles):
# make a new list of vectors
vectors= [cmath.rect(1, angle) # length 1 for each vector
for angle in list_of_angles]
vector_sum= sum(vectors)
# no need to average, we don't care for the modulus
return cmath.phase(vector_sum)
Note that Python does not need to build a temporary new list of vectors, all of the above can be done in one step; I just chose this way to approximate pseudo-code applicable to other languages too.

Here's a complete C++ solution:
#include <vector>
#include <cmath>
double dAngleAvg(const vector<double>& angles) {
auto avgSin = double{ 0.0 };
auto avgCos = double{ 0.0 };
static const auto conv = double{ 0.01745329251994 }; // PI / 180
static const auto i_conv = double{ 57.2957795130823 }; // 180 / PI
for (const auto& theta : angles) {
avgSin += sin(theta*conv);
avgCos += cos(theta*conv);
}
avgSin /= (double)angles.size();
avgCos /= (double)angles.size();
auto ret = double{ 90.0 - atan2(avgCos, avgSin) * i_conv };
if (ret<0.0) ret += 360.0;
return fmod(ret, 360.0);
}
It takes the angles in the form of a vector of doubles, and returns the average simply as a double. The angles must be in degrees, and of course the average is in degrees as well.

Based on Alnitak's answer, I've written a Java method for calculating the average of multiple angles:
If your angles are in radians:
public static double averageAngleRadians(double... angles) {
double x = 0;
double y = 0;
for (double a : angles) {
x += Math.cos(a);
y += Math.sin(a);
}
return Math.atan2(y, x);
}
If your angles are in degrees:
public static double averageAngleDegrees(double... angles) {
double x = 0;
double y = 0;
for (double a : angles) {
x += Math.cos(Math.toRadians(a));
y += Math.sin(Math.toRadians(a));
}
return Math.toDegrees(Math.atan2(y, x));
}

Here's an idea: build the average iteratively by always calculating the average of the angles that are closest together, keeping a weight.
Another idea: find the largest gap between the given angles. Find the point that bisects it, and then pick the opposite point on the circle as the reference zero to calculate the average from.

Let's represent these angles with points on the circumference of the circle.
Can we assume that all these points fall on the same half of the circle? (Otherwise, there is no obvious way to define the "average angle". Think of two points on the diameter, e.g. 0 deg and 180 deg --- is the average 90 deg or 270 deg? What happens when we have 3 or more evenly spread out points?)
With this assumption, we pick an arbitrary point on that semicircle as the "origin", and measure the given set of angles with respect to this origin (call this the "relative angle"). Note that the relative angle has an absolute value strictly less than 180 deg. Finally, take the mean of these relative angles to get the desired average angle (relative to our origin of course).

There's no single "right answer". I recommend reading the book,
K. V. Mardia and P. E. Jupp, "Directional Statistics", (Wiley, 1999),
for a thorough analysis.

(Just want to share my viewpoint from Estimation Theory or Statistical Inference)
Nimble's trial is to get the MMSE^ estimate of a set of angles, but it's one of choices to find an "averaged" direction; one can also find an MMAE^ estimate, or some other estimate to be the "averaged" direction, and it depends on your metric quantifying error of direction; or more generally in estimation theory, the definition of cost function.
^ MMSE/MMAE corresponds to minimum mean squared/absolute error.
ackb said "The average angle phi_avg should have the property that sum_i|phi_avg-phi_i|^2 becomes minimal...they average something, but not angles"
---- you quantify errors in mean-squared sense and it's one of the mostly common way, however, not the only way. The answer favored by most people here (i.e., sum of the unit vectors and get the angle of the result) is actually one of the reasonable solutions. It is (can be proved) the ML estimator that serves as the "averaged" direction we want, if the directions of the vectors are modeled as von Mises distribution. This distribution is not fancy, and is just a periodically sampled distribution from a 2D Guassian. See Eqn. (2.179) in Bishop's book "Pattern Recognition and Machine Learning". Again, by no means it's the only best one to represent "average" direction, however, it is quite reasonable one that have both good theoretical justification and simple implementation.
Nimble said "ackb is right that these vector based solutions cannot be considered true averages of angles, they are only an average of the unit vector counterparts"
----this is not true. The "unit vector counterparts" reveals the information of the direction of a vector. The angle is a quantity without considering the length of the vector, and the unit vector is something with additional information that the length is 1. You can define your "unit" vector to be of length 2, it does not really matter.

You can see a solution and a little explanation in the following link, for ANY programming language:
https://rosettacode.org/wiki/Averages/Mean_angle
For instance, C++ solution:
#include<math.h>
#include<stdio.h>
double
meanAngle (double *angles, int size)
{
double y_part = 0, x_part = 0;
int i;
for (i = 0; i < size; i++)
{
x_part += cos (angles[i] * M_PI / 180);
y_part += sin (angles[i] * M_PI / 180);
}
return atan2 (y_part / size, x_part / size) * 180 / M_PI;
}
int
main ()
{
double angleSet1[] = { 350, 10 };
double angleSet2[] = { 90, 180, 270, 360};
double angleSet3[] = { 10, 20, 30};
printf ("\nMean Angle for 1st set : %lf degrees", meanAngle (angleSet1, 2));
printf ("\nMean Angle for 2nd set : %lf degrees", meanAngle (angleSet2, 4));
printf ("\nMean Angle for 3rd set : %lf degrees\n", meanAngle (angleSet3, 3));
return 0;
}
Output:
Mean Angle for 1st set : -0.000000 degrees
Mean Angle for 2nd set : -90.000000 degrees
Mean Angle for 3rd set : 20.000000 degrees
Or Matlab solution:
function u = mean_angle(phi)
u = angle(mean(exp(i*pi*phi/180)))*180/pi;
end
mean_angle([350, 10])
ans = -2.7452e-14
mean_angle([90, 180, 270, 360])
ans = -90
mean_angle([10, 20, 30])
ans = 20.000

Here is a completely arithmetic solution using moving averages and taking care to normalize values. It is fast and delivers correct answers if all angles are on one side of the circle (within 180° of each other).
It is mathimatically equivalent to adding the offset which shifts the values into the range (0, 180), calulating the mean and then subtracting the offset.
The comments describe what range a specific value can take on at any given time
// angles have to be in the range [0, 360) and within 180° of each other.
// n >= 1
// returns the circular average of the angles int the range [0, 360).
double meanAngle(double* angles, int n)
{
double average = angles[0];
for (int i = 1; i<n; i++)
{
// average: (0, 360)
double diff = angles[i]-average;
// diff: (-540, 540)
if (diff < -180)
diff += 360;
else if (diff >= 180)
diff -= 360;
// diff: (-180, 180)
average += diff/(i+1);
// average: (-180, 540)
if (average < 0)
average += 360;
else if (average >= 360)
average -= 360;
// average: (0, 360)
}
return average;
}

Well I'm hugely late to the party but thought I'd add my 2 cents worth as I couldn't really find any definitive answer. In the end I implemented the following Java version of the Mitsuta method which, I hope, provides a simple and robust solution. Particularly as the Standard Deviation provides both a measure dispersion and, if sd == 90, indicates that the input angles result in an ambiguous mean.
EDIT: Actually I realised that my original implementation can be even further simplified, in fact worryingly simple considering all the conversation and trigonometry going on in the other answers.
/**
* The Mitsuta method
*
* #param angles Angles from 0 - 360
* #return double array containing
* 0 - mean
* 1 - sd: a measure of angular dispersion, in the range [0..360], similar to standard deviation.
* Note if sd == 90 then the mean can also be its inverse, i.e. 360 == 0, 300 == 60.
*/
public static double[] getAngleStatsMitsuta(double... angles) {
double sum = 0;
double sumsq = 0;
for (double angle : angles) {
if (angle >= 180) {
angle -= 360;
}
sum += angle;
sumsq += angle * angle;
}
double mean = sum / angles.length;
return new double[]{mean <= 0 ? 360 + mean: mean, Math.sqrt(sumsq / angles.length - (mean * mean))};
}
... and for all you (Java) geeks out there, you can use the above approach to get the mean angle in one line.
Arrays.stream(angles).map(angle -> angle<180 ? angle: (angle-360)).sum() / angles.length;

Alnitak has the right solution. Nick Fortescue's solution is functionally the same.
For the special case of where
( sum(x_component) = 0.0 && sum(y_component) = 0.0 ) // e.g. 2 angles of 10. and 190. degrees ea.
use 0.0 degrees as the sum
Computationally you have to test for this case since atan2(0. , 0.) is undefined and will generate an error.

The average angle phi_avg should have the property that sum_i|phi_avg-phi_i|^2 becomes minimal, where the difference has to be in [-Pi, Pi) (because it might be shorter to go the other way around!). This is easily achieved by normalizing all input values to [0, 2Pi), keeping a running average phi_run and choosing normalizing |phi_i-phi_run| to [-Pi,Pi)
(by adding or subtractin 2Pi). Most suggestions above do something else that does not
have that minimal property, i.e., they average something, but not angles.

I solved the problem with the help of the answer from #David_Hanak.
As he states:
The angle that points "between" the two others while staying in the same semicircle, e.g. for 355 and 5, this would be 0, not 180. To do this, you need to check if the difference between the two angles is larger than 180 or not. If so, increment the smaller angle by 360 before using the above formula.
So what I did was calculate the average of all the angles. And then all the angles that are less than this, increase them by 360. Then recalculate the average by adding them all and dividing them by their length.
float angleY = 0f;
int count = eulerAngles.Count;
for (byte i = 0; i < count; i++)
angleY += eulerAngles[i].y;
float averageAngle = angleY / count;
angleY = 0f;
for (byte i = 0; i < count; i++)
{
float angle = eulerAngles[i].y;
if (angle < averageAngle)
angle += 360f;
angleY += angle;
}
angleY = angleY / count;
Works perfectly.

Python function:
from math import sin,cos,atan2,pi
import numpy as np
def meanangle(angles,weights=0,setting='degrees'):
'''computes the mean angle'''
if weights==0:
weights=np.ones(len(angles))
sumsin=0
sumcos=0
if setting=='degrees':
angles=np.array(angles)*pi/180
for i in range(len(angles)):
sumsin+=weights[i]/sum(weights)*sin(angles[i])
sumcos+=weights[i]/sum(weights)*cos(angles[i])
average=atan2(sumsin,sumcos)
if setting=='degrees':
average=average*180/pi
return average

You can use this function in Matlab:
function retVal=DegreeAngleMean(x)
len=length(x);
sum1=0;
sum2=0;
count1=0;
count2=0;
for i=1:len
if x(i)<180
sum1=sum1+x(i);
count1=count1+1;
else
sum2=sum2+x(i);
count2=count2+1;
end
end
if (count1>0)
k1=sum1/count1;
end
if (count2>0)
k2=sum2/count2;
end
if count1>0 && count2>0
if(k2-k1 >= 180)
retVal = ((sum1+sum2)-count2*360)/len;
else
retVal = (sum1+sum2)/len;
end
elseif count1>0
retVal = k1;
else
retVal = k2;
end

While starblue's answer gives the angle of the average unit vector, it is possible to extend the concept of the arithmetic mean to angles if you accept that there may be more than one answer in the range of 0 to 2*pi (or 0° to 360°). For example, the average of 0° and 180° may be either 90° or 270°.
The arithmetic mean has the property of being the single value with the minimum sum of squared distances to the input values. The distance along the unit circle between two unit vectors can be easily calculated as the inverse cosine of their dot product. If we choose a unit vector by minimizing the sum of the squared inverse cosine of the dot product of our vector and each input unit vector then we have an equivalent average. Again, keep in mind that there may be two or more minimums in exceptional cases.
This concept could be extended to any number of dimensions, since the distance along the unit sphere can be calculated in the exact same way as the distance along the unit circle--the inverse cosine of the dot product of two unit vectors.
For circles we could solve for this average in a number of ways, but I propose the following O(n^2) algorithm (angles are in radians, and I avoid calculating the unit vectors):
var bestAverage = -1
double minimumSquareDistance
for each a1 in input
var sumA = 0;
for each a2 in input
var a = (a2 - a1) mod (2*pi) + a1
sumA += a
end for
var averageHere = sumA / input.count
var sumSqDistHere = 0
for each a2 in input
var dist = (a2 - averageHere + pi) mod (2*pi) - pi // keep within range of -pi to pi
sumSqDistHere += dist * dist
end for
if (bestAverage < 0 OR sumSqDistHere < minimumSquareDistance) // for exceptional cases, sumSqDistHere may be equal to minimumSquareDistance at least once. In these cases we will only find one of the averages
minimumSquareDistance = sumSqDistHere
bestAverage = averageHere
end if
end for
return bestAverage
If all the angles are within 180° of each other, then we could use a simpler O(n)+O(sort) algorithm (again using radians and avoiding use of unit vectors):
sort(input)
var largestGapEnd = input[0]
var largestGapSize = (input[0] - input[input.count-1]) mod (2*pi)
for (int i = 1; i < input.count; ++i)
var gapSize = (input[i] - input[i - 1]) mod (2*pi)
if (largestGapEnd < 0 OR gapSize > largestGapSize)
largestGapSize = gapSize
largestGapEnd = input[i]
end if
end for
double sum = 0
for each angle in input
var a2 = (angle - largestGapEnd) mod (2*pi) + largestGapEnd
sum += a2
end for
return sum / input.count
To use degrees, simply replace pi with 180. If you plan to use more dimensions then you will most likely have to use an iterative method to solve for the average.

The problem is extremely simple.
1. Make sure all angles are between -180 and 180 degrees.
2. a Add all non-negative angles, take their average, and COUNT how many
2. b.Add all negative angles, take their average and COUNT how many.
3. Take the difference of pos_average minus neg_average
If difference is greater than 180 then change difference to 360 minus difference. Otherwise just change the sign of difference. Note that difference is always non-negative.
The Average_Angle equals the pos_average plus difference times the "weight", negative count divided by the sum of negative and positive count

Here is some java code to average angles, I think it's reasonably robust.
public static double getAverageAngle(List<Double> angles)
{
// r = right (0 to 180 degrees)
// l = left (180 to 360 degrees)
double rTotal = 0;
double lTotal = 0;
double rCtr = 0;
double lCtr = 0;
for (Double angle : angles)
{
double norm = normalize(angle);
if (norm >= 180)
{
lTotal += norm;
lCtr++;
} else
{
rTotal += norm;
rCtr++;
}
}
double rAvg = rTotal / Math.max(rCtr, 1.0);
double lAvg = lTotal / Math.max(lCtr, 1.0);
if (rAvg > lAvg + 180)
{
lAvg += 360;
}
if (lAvg > rAvg + 180)
{
rAvg += 360;
}
double rPortion = rAvg * (rCtr / (rCtr + lCtr));
double lPortion = lAvg * (lCtr / (lCtr + rCtr));
return normalize(rPortion + lPortion);
}
public static double normalize(double angle)
{
double result = angle;
if (angle >= 360)
{
result = angle % 360;
}
if (angle < 0)
{
result = 360 + (angle % 360);
}
return result;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio