Graph (Chart) Algorithm - algorithm

Does anyone have a decent algorithm for calculating axis minima and maxima?
When creating a chart for a given set of data items, I'd like to be able to give the algorithm:
the maximum (y) value in the set
the minimum (y) value in the set
the number of tick marks to appear on the axis
an optional value that must appear as a tick (e.g. zero when showing +ve and -ve values)
The algorithm should return
the largest axis value
the smallest axis value (although that could be inferred from the largest, the interval size and the number of ticks)
the interval size
The ticks should be at a regular interval should be of a "reasonable" size (e.g. 1, 3, 5, possibly even 2.5, but not any more sig figs).
The presence of the optional value will skew this, but without that value the largest item should appear between the top two tick marks, the lowest value between the bottom two.
This is a language-agnostic question, but if there's a C#/.NET library around, that would be smashing ;)

OK, here's what I came up with for one of our applications. Note that it doesn't deal with the "optional value" scenario you mention, since our optional value is always 0, but it shouldn't be hard for you to modify.
Data is continually added to the series so we just keep the range of y values up to date by inspecting each data point as its added; this is very inexpensive and easy to keep track of. Equal minimum and maximum values are special cased: a spacing of 0 indicates that no markers should be drawn.
This solution isn't dissimilar to Andrew's suggestion above, except that it deals, in a slightly kludgy way with some arbitrary fractions of the exponent multiplier.
Lastly, this sample is in C#. Hope it helps.
private float GetYMarkerSpacing()
{
YValueRange range = m_ScrollableCanvas.
TimelineCanvas.DataModel.CurrentYRange;
if ( range.RealMinimum == range.RealMaximum )
{
return 0;
}
float absolute = Math.Max(
Math.Abs( range.RealMinimum ),
Math.Abs( range.RealMaximum ) ),
spacing = 0;
for ( int power = 0; power < 39; ++power )
{
float temp = ( float ) Math.Pow( 10, power );
if ( temp <= absolute )
{
spacing = temp;
}
else if ( temp / 2 <= absolute )
{
spacing = temp / 2;
break;
}
else if ( temp / 2.5 <= absolute )
{
spacing = temp / 2.5F;
break;
}
else if ( temp / 4 <= absolute )
{
spacing = temp / 4;
break;
}
else if ( temp / 5 <= absolute )
{
spacing = temp / 5;
break;
}
else
{
break;
}
}
return spacing;
}

I've been using the jQuery flot graph library. It's open source and does axis/tick generation quite well. I'd suggest looking at it's code and pinching some ideas from there.

I can recommend the following:
Set a visually appealing minimum number of major lines. This will depend on the nature of the data that you're presenting and the size of the plot you're doing, but 7 is a pretty good number
Choose the exponent and the multiplier based on a progression of 1, 2, 5, 10, etc. that will give you at least the minimum number of major lines. (ie. (max-min)/(scale x 10^exponent) >= minimum_tick_marks)
Find the minimum integer multiple of your exponent and multiplier that fits within your range. This will be the first major tick. The rest of the ticks are derived from this.
This was used for an application that allowed arbitrary scaling of data an seemed to work well.

Related

How to "tie" 3 RSI-indicators to one Bollinger Band, using IMAonArray()?

There are 3 RSI indicators, each having its own period.
I want to tie all 3 to one Bollinger Band.
Tell me how to do this better?
for(i=limit; i>=0; i--) {
ma=iMAOnArray(RSI,0,bb_period,0,0,i); // midle
stdev=iStdDevOnArray(RSI,0,bb_period,0,0,i); // dev
BBUP[i]=ma+bb_dev*stdev; // up
BBDOWN[i]=ma-bb_dev*stdev; // down
Buff4[i]=0;
Buff5[i]=0;
}
if(limit<Bars-1) limit++;
for(i=limit; i>0; i--) {
if(PrevSignal >= 0) {
if( RSI[i] < BBDOWN[i]
&& RSI[i+1] < BBUP[i+1]
&& RSI2[i] < BBDOWN[i]
&& RSI2[i+1] < BBUP[i+1]
&& RSI3[i] < BBDOWN[i]
&& RSI3[i+1] < BBUP[i+1]
) {
Buff4[i] = Low[i]-5*MarketInfo(Symbol(),MODE_POINT); // MathMin(BBDOWN[i],WPR[i]); // Low[i]-5*MarketInfo(Symbol(),MODE_POINT);
PrevSignal = -1;
}
}
}
I want that when all 3 RSI were below BB, then the signal
https://i.stack.imgur.com/fKff7.jpg
The code AS-IS could be designed with 4x less overhead:
Even before O/P will define the target ( how to tie three values into one ), one might have already detected the code-blocks are sync-stepped / aligned in [i]-indexing, not peeking into the future, non-intervening each to the other's values ( princially indeendent ) and thus redundant, so one could safely save a lot from these "shared" overheads:
int i;
double ma,
stdev;
for(i=limit; i>=0; i--) { RSI[ i] = iRSI( Symbol(), Period(), rsi_period, PRICE_CLOSE, i );
RSI2[i] = iRSI( Symbol(), Period(), rsi_period_2, PRICE_CLOSE, i );
RSI3[i] = iRSI( Symbol(), Period(), rsi_period_3, PRICE_CLOSE, i );
/* the "last" for(){...} code-block body is not included,
until the calculus of how to tie three vectors
into a common Bollinger Band is defined by O/P,
but
could fit in this common "stream"-processing too
*/
}
The O/P code has been remarkably changed - may check the revisions
Nota bene: ad-hoc the presented update
Given the Quantitative Finance heritage, from J. Welles Wilder, the author of the said RSI model, the definition says:
Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. RSI oscillates between zero and 100.
It turns out, that any attempt to mix apples and oranges leads to just a principal confusion. Taking the apples == the Bollinger Band ( having a clear PriceDOMAIN dimension [CCY2] ) and trying to mix it ( additively, using { ADD | SUB } operations ) with oranges == RSI ( a dimension-less relative / percent indicator ) will yield an uninterpretable result ( has a hidden superposition + scaling uncertainties ).
Still in doubts?
Just let's take XAUUSD for a second. It's Bollinger Band values, bearing in mind any reasonable multiples of sigma ( StDev ), the RSI will always be under the lower-band ... thus such construct will have zero information value, as:RSI-<0..100> << BollingerBandLOWER-( 1257.000 - ( N * sigma ) )
So a model ought be revised, so as to have some Quantitative Finance support built-in. Mixing apples with oranges simply will not help in any serious business sense.

How to convert between Hilbert Curve QuadTree and S2 Geometry CellId

Problem
Let's say I know the Hilbert Curve Face and Quadtree, such as 4/032212303102122 (face 4, level 15).
Or perhaps I know the S2 Geometry CellId, such as 9749618424903892992.
How can I convert from the one to the other?
Application
(this is the kind of thing you need to do for Pokemon GO and Ingress maps)
Exploration
I'm trying to do this in JavaScript and a library exists for manipulating 64-bit integers (long.js) as well as for S2CellIds (s2-geometry.js).
Also, I'm feeling pretty good about walking the hilbert curve simply by adding or subtracting the base four numbers (except when crossing faces, but that happens rarely enough that I'll be fine... for a while...), just not sure how to go back and forth with the 64-bit id.
It turns out that it's much, much, much easier to do it with strings than with binary - and since this is JavaScript where bitshifting with the long.js would take significantly more time, it's actually faster!
Code Example:
From s2-geometry-javascript:
'use strict';
var Long = require('long');
var S2 = {};
S2.FACE_BITS = 3;
S2.MAX_LEVEL = 30;
S2.POS_BITS = (2 * S2.MAX_LEVEL) + 1;
S2.fromFacePosLevel = function (faceN, posS, levelN) {
var Long = exports.dcodeIO && exports.dcodeIO.Long || require('long');
if (!levelN) {
levelN = posS.length;
}
if (posS.length > levelN) {
posS = posS.substr(0, levelN);
}
var posB = Long.fromString(posS, true, 4).toString(2);
while (posB.length < (2 * levelN)) {
posB = '0' + posB;
}
var bin = Long.fromString(faceN.toString(10), true, 10).toString(2);
while (bin.length < S2.FACE_BITS) {
bin = '0' + bin;
}
bin += posB;
bin += '1';
while (bin.length < (S2.FACE_BITS + S2.POS_BITS)) {
bin += '0';
}
return Long.fromString(bin, true, 2).toString(10);
};
Explanation:
Here's a quick 'n' dirty breakdown of the bits
id encoding
Note that + means concat and NOT add
(padding + face bits) + (padding + position bits) + (lsb marker + padding)
// quadkey 4/032212303102210
// id (base 10) 9749618446378729472
// base 4 10 032212303102210 1000000000000000
// base 2 100 001110100110110011010010100100 1000000000000000000000000000000
face encoding
"human readable" form is base 10
3-bit - i.e. an unfolded 6-sided cube with base 10 face representations of 0,1,2,3,4,5
6 and 7 are unused and invalid
3 binary characters - i.e. 000, 001, 010, 011, 100, 101
110 and 111 are unused and invalid
left-padded to 3-bits with '0's (i.e. 001)
position encoding
"human readable" form is base 4 (quadkey)
61-bit
60 data bits, 1 bit for lsb marker
left-padded to LEVEL with '0's (i.e. 00322130 for level 8)
level encoding
"human readable" form is base 10
the length of hilbert curve quadkey / quadtree string is the level
calculated from the least significant bit in binary form
lsb (least-significant bit) marker is '1', just to right of position
right-padded to MAX_LEVEL*2 (after lsb marker) with a leading '0's
(i.e. '1' for level 30, '1000' for level 27)

Efficient way to generate a seemingly random permutation from a very large set without repeating?

I have a very large set (billions or more, it's expected to grow exponentially to some level), and I want to generate seemingly random elements from it without repeating. I know I can pick a random number and repeat and record the elements I have generated, but that takes more and more memory as numbers are generated, and wouldn't be practical after couple millions elements out.
I mean, I could say 1, 2, 3 up to billions and each would be constant time without remembering all the previous, or I can say 1,3,5,7,9 and on then 2,4,6,8,10, but is there a more sophisticated way to do that and eventually get a seemingly random permutation of that set?
Update
1, The set does not change size in the generation process. I meant when the user's input increases linearly, the size of the set increases exponentially.
2, In short, the set is like the set of every integer from 1 to 10 billions or more.
3, In long, it goes up to 10 billion because each element carries the information of many independent choices, for example. Imagine an RPG character that have 10 attributes, each can go from 1 to 100 (for my problem different choices can have different ranges), thus there's 10^20 possible characters, number "10873456879326587345" would correspond to a character that have "11, 88, 35...", and I would like an algorithm to generate them one by one without repeating, but makes it looks random.
Thanks for the interesting question. You can create a "pseudorandom"* (cyclic) permutation with a few bytes using modular exponentiation. Say we have n elements. Search for a prime p that's bigger than n+1. Then find a primitive root g modulo p. Basically by definition of primitive root, the action x --> (g * x) % p is a cyclic permutation of {1, ..., p-1}. And so x --> ((g * (x+1))%p) - 1 is a cyclic permutation of {0, ..., p-2}. We can get a cyclic permutation of {0, ..., n-1} by repeating the previous permutation if it gives a value bigger (or equal) n.
I implemented this idea as a Go package. https://github.com/bwesterb/powercycle
package main
import (
"fmt"
"github.com/bwesterb/powercycle"
)
func main() {
var x uint64
cycle := powercycle.New(10)
for i := 0; i < 10; i++ {
fmt.Println(x)
x = cycle.Apply(x)
}
}
This outputs something like
0
6
4
1
2
9
3
5
8
7
but that might vary off course depending on the generator chosen.
It's fast, but not super-fast: on my five year old i7 it takes less than 210ns to compute one application of a cycle on 1000000000000000 elements. More details:
BenchmarkNew10-8 1000000 1328 ns/op
BenchmarkNew1000-8 500000 2566 ns/op
BenchmarkNew1000000-8 50000 25893 ns/op
BenchmarkNew1000000000-8 200000 7589 ns/op
BenchmarkNew1000000000000-8 2000 648785 ns/op
BenchmarkApply10-8 10000000 170 ns/op
BenchmarkApply1000-8 10000000 173 ns/op
BenchmarkApply1000000-8 10000000 172 ns/op
BenchmarkApply1000000000-8 10000000 169 ns/op
BenchmarkApply1000000000000-8 10000000 201 ns/op
BenchmarkApply1000000000000000-8 10000000 204 ns/op
Why did I say "pseudorandom"? Well, we are always creating a very specific kind of cycle: namely one that uses modular exponentiation. It looks pretty pseudorandom though.
I would use a random number and swap it with an element at the beginning of the set.
Here's some pseudo code
set = [1, 2, 3, 4, 5, 6]
picked = 0
Function PickNext(set, picked)
If picked > Len(set) - 1 Then
Return Nothing
End If
// random number between picked (inclusive) and length (exclusive)
r = RandomInt(picked, Len(set))
// swap the picked element to the beginning of the set
result = set[r]
set[r] = set[picked]
set[picked] = result
// update picked
picked++
// return your next random element
Return temp
End Function
Every time you pick an element there is one swap and the only extra memory being used is the picked variable. The swap can happen if the elements are in a database or in memory.
EDIT Here's a jsfiddle of a working implementation http://jsfiddle.net/sun8rw4d/
JavaScript
var set = [];
set.picked = 0;
function pickNext(set) {
if(set.picked > set.length - 1) { return null; }
var r = set.picked + Math.floor(Math.random() * (set.length - set.picked));
var result = set[r];
set[r] = set[set.picked];
set[set.picked] = result;
set.picked++;
return result;
}
// testing
for(var i=0; i<100; i++) {
set.push(i);
}
while(pickNext(set) !== null) { }
document.body.innerHTML += set.toString();
EDIT 2 Finally, a random binary walk of the set. This can be accomplished with O(Log2(N)) stack space (memory) which for 10billion is only 33. There's no shuffling or swapping involved. Using trinary instead of binary might yield even better pseudo random results.
// on the fly set generator
var count = 0;
var maxValue = 64;
function nextElement() {
// restart the generation
if(count == maxValue) {
count = 0;
}
return count++;
}
// code to pseudo randomly select elements
var current = 0;
var stack = [0, maxValue - 1];
function randomBinaryWalk() {
if(stack.length == 0) { return null; }
var high = stack.pop();
var low = stack.pop();
var mid = ((high + low) / 2) | 0;
// pseudo randomly choose the next path
if(Math.random() > 0.5) {
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
} else {
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
}
// how many elements to skip
var toMid = (current < mid ? mid - current : (maxValue - current) + mid);
// skip elements
for(var i = 0; i < toMid - 1; i++) {
nextElement();
}
current = mid;
// get result
return nextElement();
}
// test
var result;
var list = [];
do {
result = randomBinaryWalk();
list.push(result);
} while(result !== null);
document.body.innerHTML += '<br/>' + list.toString();
Here's the results from a couple of runs with a small set of 64 elements. JSFiddle http://jsfiddle.net/yooLjtgu/
30,46,38,34,36,35,37,32,33,31,42,40,41,39,44,45,43,54,50,52,53,51,48,47,49,58,60,59,61,62,56,57,55,14,22,18,20,19,21,16,15,17,26,28,29,27,24,25,23,6,2,4,5,3,0,1,63,10,8,7,9,12,11,13
30,14,22,18,16,15,17,20,19,21,26,28,29,27,24,23,25,6,10,8,7,9,12,13,11,2,0,63,1,4,5,3,46,38,42,44,45,43,40,41,39,34,36,35,37,32,31,33,54,58,56,55,57,60,59,61,62,50,48,49,47,52,51,53
As I mentioned in my comment, unless you have an efficient way to skip to a specific point in your "on the fly" generation of the set this will not be very efficient.
if it is enumerable then use a pseudo-random integer generator adjusted to the period 0 .. 2^n - 1 where the upper bound is just greater than the size of your set and generate pseudo-random integers discarding those more than the size of your set. Use those integers to index items from your set.
Pre- compute yourself a series of indices (e.g. in a file), which has the properties you need and then randomly choose a start index for your enumeration and use the series in a round-robin manner.
The length of your pre-computed series should be > the maximum size of the set.
If you combine this (depending on your programming language etc.) with file mappings, your final nextIndex(INOUT state) function is (nearly) as simple as return mappedIndices[state++ % PERIOD];, if you have a fixed size of each entry (e.g. 8 bytes -> uint64_t).
Of course, the returned value could be > your current set size. Simply draw indices until you get one which is <= your sets current size.
Update (In response to question-update):
There is another option to achieve your goal if it is about creating 10Billion unique characters in your RPG: Generate a GUID and write yourself a function which computes your number from the GUID. man uuid if you are are on a unix system. Else google it. Some parts of the uuid are not random but contain meta-info, some parts are either systematic (such as your network cards MAC address) or random, depending on generator algorithm. But they are very very most likely unique. So, whenever you need a new unique number, generate a uuid and transform it to your number by means of some algorithm which basically maps the uuid bytes to your number in a non-trivial way (e.g. use hash functions).

Reasonable optimized chart scaling

I need to make a chart with an optimized y axis maximum value.
The current method I have of making charts simply uses the maximum value of all the graphs, then divides it by ten, and uses that as grid lines. I didn't write it.
Update Note: These graphs have been changed. As soon as I fixed the code, my dynamic graphs started working, making this question nonsensical (because the examples no longer had any errors in them). I've updated these with static images, but some of the answers refrence different values. Keep that in mind.
There were between 12003 and 14003 inbound calls so far in February. Informative, but ugly.
I'd like to avoid charts that look like a monkey came up with the y-axis numbers.
Using the Google charts API helps a little bit, but it's still not quite what I want.
The numbers are clean, but the top of the y value is always the same as the maximum value on the chart. This chart scales from 0 to 1357. I need to have calculated the proper value of 1400, problematically.
I'm throwing in rbobby's defanition of a 'nice' number here because it explains it so well.
A "nice" number is one that has 3 or fewer non-zero digits (eg. 1230000)
A "nice" number has the same or few non-zero digits than zero digits (eg 1230 is not nice, 1200 is nice)
The nicest numbers are ones with multiples of 3 zeros (eg. "1,000", "1,000,000")
The second nicest numbers are onces with multples of 3 zeros plus 2 zeros (eg. "1,500,000", "1,200")
Solution
I found the way to get the results that I want using a modified version of Mark Ransom's idea.
Fist, Mark Ransom's code determines the optimum spacing between ticks, when given the number of ticks. Sometimes this number ends up being more than twice what the highest value on the chart is, depending on how many grid lines you want.
What I'm doing is I'm running Mark's code with 5, 6, 7, 8, 9, and 10 grid lines (ticks) to find which of those is the lowest. With a value of 23, the height of the chart goes to 25, with a grid line at 5, 10, 15, 20, and 25. With a value of 26, the chart's height is 30, with grid lines at 5, 10, 15, 20, 25, and 30. It has the same spacing between grid lines, but there are more of them.
So here's the steps to just-about copy what Excel does to make charts all fancy.
Temporarily bump up the chart's highest value by about 5% (so that there is always some space between the chart's highest point and the top of the chart area. We want 99.9 to round up to 120)
Find the optimum grid line placement
for 5, 6, 7, 8, 9, and 10 grid
lines.
Pick out the lowest of those numbers. Remember the number of grid lines it took to get that value.
Now you have the optimum chart height. The lines/bar will never butt up against the top of the chart and you have the optimum number of ticks.
PHP:
function roundUp($maxValue){
$optiMax = $maxValue * 2;
for ($i = 5; $i <= 10; $i++){
$tmpMaxValue = bestTick($maxValue,$i);
if (($optiMax > $tmpMaxValue) and ($tmpMaxValue > ($maxValue + $maxValue * 0.05))){
$optiMax = $tmpMaxValue;
$optiTicks = $i;
}
}
return $optiMax;
}
function bestTick($maxValue, $mostTicks){
$minimum = $maxValue / $mostTicks;
$magnitude = pow(10,floor(log($minimum) / log(10)));
$residual = $minimum / $magnitude;
if ($residual > 5){
$tick = 10 * $magnitude;
} elseif ($residual > 2) {
$tick = 5 * $magnitude;
} elseif ($residual > 1){
$tick = 2 * $magnitude;
} else {
$tick = $magnitude;
}
return ($tick * $mostTicks);
}
Python:
import math
def BestTick(largest, mostticks):
minimum = largest / mostticks
magnitude = 10 ** math.floor(math.log(minimum) / math.log(10))
residual = minimum / magnitude
if residual > 5:
tick = 10 * magnitude
elif residual > 2:
tick = 5 * magnitude
elif residual > 1:
tick = 2 * magnitude
else:
tick = magnitude
return tick
value = int(input(""))
optMax = value * 2
for i in range(5,11):
maxValue = BestTick(value,i) * i
print maxValue
if (optMax > maxValue) and (maxValue > value + (value*.05)):
optMax = maxValue
optTicks = i
print "\nTest Value: " + str(value + (value * .05)) + "\n\nChart Height: " + str(optMax) + " Ticks: " + str(optTicks)
This is from a previous similar question:
Algorithm for "nice" grid line intervals on a graph
I've done this with kind of a brute
force method. First, figure out the
maximum number of tick marks you can
fit into the space. Divide the total
range of values by the number of
ticks; this is the minimum
spacing of the tick. Now calculate
the floor of the logarithm base 10 to
get the magnitude of the tick, and
divide by this value. You should end
up with something in the range of 1 to
10. Simply choose the round number greater than or equal to the value and
multiply it by the logarithm
calculated earlier. This is your
final tick spacing.
Example in Python:
import math
def BestTick(largest, mostticks):
minimum = largest / mostticks
magnitude = 10 ** math.floor(math.log(minimum) / math.log(10))
residual = minimum / magnitude
if residual > 5:
tick = 10 * magnitude
elif residual > 2:
tick = 5 * magnitude
elif residual > 1:
tick = 2 * magnitude
else:
tick = magnitude
return tick
You could round up to two significant figures. The following pseudocode should work:
// maxValue is the largest value in your chart
magnitude = floor(log10(maxValue))
base = 10^(magnitude - 1)
chartHeight = ceiling(maxValue / base) * base
For example, if maxValue is 1357, then magnitude is 3 and base is 100. Dividing by 100, rounding up, and multiplying by 100 has the result of rounding up to the next multiple of 100, i.e. rounding up to two significant figures. In this case, the result if 1400 (1357 ⇒ 13.57 ⇒ 14 ⇒ 1400).
In the past I've done this in a brute force-ish sort of way. Here's a chunk of C++ code that works well... but for a hardcoded lower and upper limits (0 and 5000):
int PickYUnits()
{
int MinSize[8] = {20, 20, 20, 20, 20, 20, 20, 20};
int ItemsPerUnit[8] = {5, 10, 20, 25, 50, 100, 250, 500};
int ItemLimits[8] = {20, 50, 100, 250, 500, 1000, 2500, 5000};
int MaxNumUnits = 8;
double PixelsPerY;
int PixelsPerAxis;
int Units;
//
// Figure out the max from the dataset
// - Min is always 0 for a bar chart
//
m_MinY = 0;
m_MaxY = -9999999;
m_TotalY = 0;
for (int j = 0; j < m_DataPoints.GetSize(); j++) {
if (m_DataPoints[j].m_y > m_MaxY) {
m_MaxY = m_DataPoints[j].m_y;
}
m_TotalY += m_DataPoints[j].m_y;
}
//
// Give some space at the top
//
m_MaxY = m_MaxY + 1;
//
// Figure out the size of the range
//
double yRange = (m_MaxY - m_MinY);
//
// Pick the initial size
//
Units = MaxNumUnits;
for (int k = 0; k < MaxNumUnits; k++)
{
if (yRange < ItemLimits[k])
{
Units = k;
break;
}
}
//
// Adjust it upwards based on the space available
//
PixelsPerY = m_rcGraph.Height() / yRange;
PixelsPerAxis = (int)(PixelsPerY * ItemsPerUnit[Units]);
while (PixelsPerAxis < MinSize[Units]){
Units += 1;
PixelsPerAxis = (int)(PixelsPerY * ItemsPerUnit[Units]);
if (Units == 5)
break;
}
return ItemsPerUnit[Units];
}
However something in what you've said tweaked me. To pick nice axis numbers a definition of "nice number" would help:
A "nice" number is one that has 3 or fewer non-zero digits (eg. 1230000)
A "nice" number has the same or few non-zero digits than zero digits (eg 1230 is not nice, 1200 is nice)
The nicest numbers are ones with multiples of 3 zeros (eg. "1,000", "1,000,000")
The second nicest numbers are onces with multples of 3 zeros plus 2 zeros (eg. "1,500,000", "1,200")
Not sure if the above definition is "right" or actually helpful (but with the definition in hand it then becomes a simpler task to devise an algorithm).
A slight refinement and tested... (works for fractions of units and not just integers)
public void testNumbers() {
double test = 0.20000;
double multiple = 1;
int scale = 0;
String[] prefix = new String[]{"", "m", "u", "n"};
while (Math.log10(test) < 0) {
multiple = multiple * 1000;
test = test * 1000;
scale++;
}
double tick;
double minimum = test / 10;
double magnitude = 100000000;
while (minimum <= magnitude){
magnitude = magnitude / 10;
}
double residual = test / (magnitude * 10);
if (residual > 5) {
tick = 10 * magnitude;
} else if (residual > 2) {
tick = 5 * magnitude;
} else if (residual > 1) {
tick = 2 * magnitude;
} else {
tick = magnitude;
}
double curAmt = 0;
int ticks = (int) Math.ceil(test / tick);
for (int ix = 0; ix < ticks; ix++) {
curAmt += tick;
BigDecimal bigDecimal = new BigDecimal(curAmt);
bigDecimal.setScale(2, BigDecimal.ROUND_HALF_UP);
System.out.println(bigDecimal.stripTrailingZeros().toPlainString() + prefix[scale] + "s");
}
System.out.println("Value = " + test + prefix[scale] + "s");
System.out.println("Tick = " + tick + prefix[scale] + "s");
System.out.println("Ticks = " + ticks);
System.out.println("Scale = " + multiple + " : " + scale);
}
If you want 1400 at the top, how about adjusting the last two parameters to 1400 instead of 1357:
You could use div and mod. For example.
Let's say you want your chart to round up by increments of 20 (just to make it more a more arbitrary number than your typical "10" value).
So I would assume that 1, 11, 18 would all round up to 20. But 21, 33, 38 would round to 40.
To come up with the right value do the following:
Where divisor = your rounding increment.
divisor = 20
multiple = maxValue / divisor; // Do an integer divide here.
if (maxValue modulus divisor > 0)
multiple++;
graphMax = multiple * maxValue;
So now let's plugin real numbers:
divisor = 20;
multiple = 33 / 20; (integer divide)
so multiple = 1
if (33 modulus 20 > 0) (it is.. it equals 13)
multiple++;
so multiple = 2;
graphMax = multiple (2) * maxValue (20);
graphMax = 40;

How do you calculate the average of a set of circular data?

I want to calculate the average of a set of circular data. For example, I might have several samples from the reading of a compass. The problem of course is how to deal with the wraparound. The same algorithm might be useful for a clockface.
The actual question is more complicated - what do statistics mean on a sphere or in an algebraic space which "wraps around", e.g. the additive group mod n. The answer may not be unique, e.g. the average of 359 degrees and 1 degree could be 0 degrees or 180, but statistically 0 looks better.
This is a real programming problem for me and I'm trying to make it not look like just a Math problem.
Compute unit vectors from the angles and take the angle of their average.
This question is examined in detail in the book:
"Statistics On Spheres", Geoffrey S. Watson, University of Arkansas Lecture
Notes in the Mathematical Sciences, 1983 John Wiley & Sons, Inc. as mentioned at http://catless.ncl.ac.uk/Risks/7.44.html#subj4 by Bruce Karsh.
A good way to estimate an average angle, A, from a set of angle measurements
a[i] 0<=i
sum_i_from_1_to_N sin(a[i])
a = arctangent ---------------------------
sum_i_from_1_to_N cos(a[i])
The method given by starblue is computationally equivalent, but his reasons are clearer and probably programmatically more efficient, and also work well in the zero case, so kudos to him.
The subject is now explored in more detail on Wikipedia, and with other uses, like fractional parts.
I see the problem - for example, if you have a 45' angle and a 315' angle, the "natural" average would be 180', but the value you want is actually 0'.
I think Starblue is onto something. Just calculate the (x, y) cartesian coordinates for each angle, and add those resulting vectors together. The angular offset of the final vector should be your required result.
x = y = 0
foreach angle {
x += cos(angle)
y += sin(angle)
}
average_angle = atan2(y, x)
I'm ignoring for now that a compass heading starts at north, and goes clockwise, whereas "normal" cartesian coordinates start with zero along the X axis, and then go anti-clockwise. The maths should work out the same way regardless.
FOR THE SPECIAL CASE OF TWO ANGLES:
The answer ( (a + b) mod 360 ) / 2 is WRONG. For angles 350 and 2, the closest point is 356, not 176.
The unit vector and trig solutions may be too expensive.
What I've got from a little tinkering is:
diff = ( ( a - b + 180 + 360 ) mod 360 ) - 180
angle = (360 + b + ( diff / 2 ) ) mod 360
0, 180 -> 90 (two answers for this: this equation takes the clockwise answer from a)
180, 0 -> 270 (see above)
180, 1 -> 90.5
1, 180 -> 90.5
20, 350 -> 5
350, 20 -> 5 (all following examples reverse properly too)
10, 20 -> 15
350, 2 -> 356
359, 0 -> 359.5
180, 180 -> 180
ackb is right that these vector based solutions cannot be considered true averages of angles, they are only an average of the unit vector counterparts. However, ackb's suggested solution does not appear to mathematically sound.
The following is a solution that is mathematically derived from the goal of minimising (angle[i] - avgAngle)^2 (where the difference is corrected if necessary), which makes it a true arithmetic mean of the angles.
First, we need to look at exactly which cases the difference between angles is different to the difference between their normal number counterparts. Consider angles x and y, if y >= x - 180 and y <= x + 180, then we can use the difference (x-y) directly. Otherwise, if the first condition is not met then we must use (y+360) in the calculation instead of y. Corresponding, if the second condition is not met then we must use (y-360) instead of y. Since the equation of the curve we are minimising only changes at the points where these inequalities change from true to false or vice versa, we can separate the full [0,360) range into a set of segments, separated by these points. Then, we only need to find the minimum of each of these segments, and then the minimum of each segment's minimum, which is the average.
Here's an image demonstrating where the problems occur in calculating angle differences. If x lies in the gray area then there will be a problem.
To minimise a variable, depending on the curve, we can take the derivative of what we want to minimise and then we find the turning point (which is where the derivative = 0).
Here we will apply the idea of minimise the squared difference to derive the common arithmetic mean formula: sum(a[i])/n. The curve y = sum((a[i]-x)^2) can be minimised in this way:
y = sum((a[i]-x)^2)
= sum(a[i]^2 - 2*a[i]*x + x^2)
= sum(a[i]^2) - 2*x*sum(a[i]) + n*x^2
dy\dx = -2*sum(a[i]) + 2*n*x
for dy/dx = 0:
-2*sum(a[i]) + 2*n*x = 0
-> n*x = sum(a[i])
-> x = sum(a[i])/n
Now applying it to curves with our adjusted differences:
b = subset of a where the correct (angular) difference a[i]-x
c = subset of a where the correct (angular) difference (a[i]-360)-x
cn = size of c
d = subset of a where the correct (angular) difference (a[i]+360)-x
dn = size of d
y = sum((b[i]-x)^2) + sum(((c[i]-360)-b)^2) + sum(((d[i]+360)-c)^2)
= sum(b[i]^2 - 2*b[i]*x + x^2)
+ sum((c[i]-360)^2 - 2*(c[i]-360)*x + x^2)
+ sum((d[i]+360)^2 - 2*(d[i]+360)*x + x^2)
= sum(b[i]^2) - 2*x*sum(b[i])
+ sum((c[i]-360)^2) - 2*x*(sum(c[i]) - 360*cn)
+ sum((d[i]+360)^2) - 2*x*(sum(d[i]) + 360*dn)
+ n*x^2
= sum(b[i]^2) + sum((c[i]-360)^2) + sum((d[i]+360)^2)
- 2*x*(sum(b[i]) + sum(c[i]) + sum(d[i]))
- 2*x*(360*dn - 360*cn)
+ n*x^2
= sum(b[i]^2) + sum((c[i]-360)^2) + sum((d[i]+360)^2)
- 2*x*sum(x[i])
- 2*x*360*(dn - cn)
+ n*x^2
dy/dx = 2*n*x - 2*sum(x[i]) - 2*360*(dn - cn)
for dy/dx = 0:
2*n*x - 2*sum(x[i]) - 2*360*(dn - cn) = 0
n*x = sum(x[i]) + 360*(dn - cn)
x = (sum(x[i]) + 360*(dn - cn))/n
This alone is not quite enough to get the minimum, while it works for normal values, that has an unbounded set, so the result will definitely lie within set's range and is therefore valid. We need the minimum within a range (defined by the segment). If the minimum is less than our segment's lower bound then the minimum of that segment must be at the lower bound (because quadratic curves only have 1 turning point) and if the minimum is greater than our segment's upper bound then the segment's minimum is at the upper bound. After we have the minimum for each segment, we simply find the one that has the lowest value for what we're minimising (sum((b[i]-x)^2) + sum(((c[i]-360)-b)^2) + sum(((d[i]+360)-c)^2)).
Here is an image to the curve, which shows how it changes at the points where x=(a[i]+180)%360. The data set is in question is {65,92,230,320,250}.
Here is an implementation of the algorithm in Java, including some optimisations, its complexity is O(nlogn). It can be reduced to O(n) if you replace the comparison based sort with a non comparison based sort, such as radix sort.
static double varnc(double _mean, int _n, double _sumX, double _sumSqrX)
{
return _mean*(_n*_mean - 2*_sumX) + _sumSqrX;
}
//with lower correction
static double varlc(double _mean, int _n, double _sumX, double _sumSqrX, int _nc, double _sumC)
{
return _mean*(_n*_mean - 2*_sumX) + _sumSqrX
+ 2*360*_sumC + _nc*(-2*360*_mean + 360*360);
}
//with upper correction
static double varuc(double _mean, int _n, double _sumX, double _sumSqrX, int _nc, double _sumC)
{
return _mean*(_n*_mean - 2*_sumX) + _sumSqrX
- 2*360*_sumC + _nc*(2*360*_mean + 360*360);
}
static double[] averageAngles(double[] _angles)
{
double sumAngles;
double sumSqrAngles;
double[] lowerAngles;
double[] upperAngles;
{
List<Double> lowerAngles_ = new LinkedList<Double>();
List<Double> upperAngles_ = new LinkedList<Double>();
sumAngles = 0;
sumSqrAngles = 0;
for(double angle : _angles)
{
sumAngles += angle;
sumSqrAngles += angle*angle;
if(angle < 180)
lowerAngles_.add(angle);
else if(angle > 180)
upperAngles_.add(angle);
}
Collections.sort(lowerAngles_);
Collections.sort(upperAngles_,Collections.reverseOrder());
lowerAngles = new double[lowerAngles_.size()];
Iterator<Double> lowerAnglesIter = lowerAngles_.iterator();
for(int i = 0; i < lowerAngles_.size(); i++)
lowerAngles[i] = lowerAnglesIter.next();
upperAngles = new double[upperAngles_.size()];
Iterator<Double> upperAnglesIter = upperAngles_.iterator();
for(int i = 0; i < upperAngles_.size(); i++)
upperAngles[i] = upperAnglesIter.next();
}
List<Double> averageAngles = new LinkedList<Double>();
averageAngles.add(180d);
double variance = varnc(180,_angles.length,sumAngles,sumSqrAngles);
double lowerBound = 180;
double sumLC = 0;
for(int i = 0; i < lowerAngles.length; i++)
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles + 360*i)/_angles.length;
//minimum is outside segment range (therefore not directly relevant)
//since it is greater than lowerAngles[i], the minimum for the segment
//must lie on the boundary lowerAngles[i]
if(testAverageAngle > lowerAngles[i]+180)
testAverageAngle = lowerAngles[i];
if(testAverageAngle > lowerBound)
{
double testVariance = varlc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,i,sumLC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
lowerBound = lowerAngles[i];
sumLC += lowerAngles[i];
}
//Test last segment
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles + 360*lowerAngles.length)/_angles.length;
//minimum is inside segment range
//we will test average 0 (360) later
if(testAverageAngle < 360 && testAverageAngle > lowerBound)
{
double testVariance = varlc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,lowerAngles.length,sumLC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
}
double upperBound = 180;
double sumUC = 0;
for(int i = 0; i < upperAngles.length; i++)
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles - 360*i)/_angles.length;
//minimum is outside segment range (therefore not directly relevant)
//since it is greater than lowerAngles[i], the minimum for the segment
//must lie on the boundary lowerAngles[i]
if(testAverageAngle < upperAngles[i]-180)
testAverageAngle = upperAngles[i];
if(testAverageAngle < upperBound)
{
double testVariance = varuc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,i,sumUC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
upperBound = upperAngles[i];
sumUC += upperBound;
}
//Test last segment
{
//get average for a segment based on minimum
double testAverageAngle = (sumAngles - 360*upperAngles.length)/_angles.length;
//minimum is inside segment range
//we test average 0 (360) now
if(testAverageAngle < 0)
testAverageAngle = 0;
if(testAverageAngle < upperBound)
{
double testVariance = varuc(testAverageAngle,_angles.length,sumAngles,sumSqrAngles,upperAngles.length,sumUC);
if(testVariance < variance)
{
averageAngles.clear();
averageAngles.add(testAverageAngle);
variance = testVariance;
}
else if(testVariance == variance)
averageAngles.add(testAverageAngle);
}
}
double[] averageAngles_ = new double[averageAngles.size()];
Iterator<Double> averageAnglesIter = averageAngles.iterator();
for(int i = 0; i < averageAngles_.length; i++)
averageAngles_[i] = averageAnglesIter.next();
return averageAngles_;
}
The arithmetic mean of a set of angles may not agree with your intuitive idea of what the average should be. For example, the arithmetic mean of the set {179,179,0,181,181} is 216 (and 144). The answer you immediately think of is probably 180, however it is well known that the arithmetic mean is heavily affected by edge values. You should also remember that angles are not vectors, as appealing as that may seem when dealing with angles sometimes.
This algorithm does of course also apply to all quantities that obey modular arithmetic (with minimal adjustment), such as the time of day.
I would also like to stress that even though this is a true average of angles, unlike the vector solutions, that does not necessarily mean it is the solution you should be using, the average of the corresponding unit vectors may well be the value you actually should to be using.
You have to define average more accurately. For the specific case of two angles, I can think of two different scenarios:
The "true" average, i.e. (a + b) / 2 % 360.
The angle that points "between" the two others while staying in the same semicircle, e.g. for 355 and 5, this would be 0, not 180. To do this, you need to check if the difference between the two angles is larger than 180 or not. If so, increment the smaller angle by 360 before using the above formula.
I don't see how the second alternative can be generalized for the case of more than two angles, though.
I'd like to share an method I used with a microcontroller which did not have floating point or trigonometry capabilities. I still needed to "average" 10 raw bearing readings in order to smooth out variations.
Check whether the first bearing is the range 270-360 or 0-90 degrees (northern two quadrants)
If it is, rotate this and all subsequent readings by 180 degrees, keeping all values in the range 0 <= bearing < 360. Otherwise take the readings as they come.
Once 10 readings have been taken calculate the numerical average assuming that there has been no wraparound
If the 180 degree rotation had been in effect then rotate the calculated average by 180 degrees to get back to a "true" bearing.
It's not ideal; it can break. I got away with it in this case because the device only rotates very slowly. I'll put it out there in case anyone else finds themselves working under similar restrictions.
Like all averages, the answer depends upon the choice of metric. For a given metric M, the average of some angles a_k in [-pi,pi] for k in [1,N] is that angle a_M which minimizes the sum of squared distances d^2_M(a_M,a_k). For a weighted mean, one simply includes in the sum the weights w_k (such that sum_k w_k = 1). That is,
a_M = arg min_x sum_k w_k d^2_M(x,a_k)
Two common choices of metric are the Frobenius and the Riemann metrics. For the Frobenius metric, a direct formula exists that corresponds to the usual notion of average bearing in circular statistics. See "Means and Averaging in the Group of Rotations", Maher Moakher, SIAM Journal on Matrix Analysis and Applications, Volume 24, Issue 1, 2002, for details.
http://link.aip.org/link/?SJMAEL/24/1/1
Here's a function for GNU Octave 3.2.4 that does the computation:
function ma=meanangleoct(a,w,hp,ntype)
% ma=meanangleoct(a,w,hp,ntype) returns the average of angles a
% given weights w and half-period hp using norm type ntype
% Ref: "Means and Averaging in the Group of Rotations",
% Maher Moakher, SIAM Journal on Matrix Analysis and Applications,
% Volume 24, Issue 1, 2002.
if (nargin<1) | (nargin>4), help meanangleoct, return, end
if isempty(a), error('no measurement angles'), end
la=length(a); sa=size(a);
if prod(sa)~=la, error('a must be a vector'); end
if (nargin<4) || isempty(ntype), ntype='F'; end
if ~sum(ntype==['F' 'R']), error('ntype must be F or R'), end
if (nargin<3) || isempty(hp), hp=pi; end
if (nargin<2) || isempty(w), w=1/la+0*a; end
lw=length(w); sw=size(w);
if prod(sw)~=lw, error('w must be a vector'); end
if lw~=la, error('length of w must equal length of a'), end
if sum(w)~=1, warning('resumming weights to unity'), w=w/sum(w); end
a=a(:); % make column vector
w=w(:); % make column vector
a=mod(a+hp,2*hp)-hp; % reduce to central period
a=a/hp*pi; % scale to half period pi
z=exp(i*a); % U(1) elements
% % NOTA BENE:
% % fminbnd can get hung up near the boundaries.
% % If that happens, shift the input angles a
% % forward by one half period, then shift the
% % resulting mean ma back by one half period.
% X=fminbnd(#meritfcn,-pi,pi,[],z,w,ntype);
% % seems to work better
x0=imag(log(sum(w.*z)));
X=fminbnd(#meritfcn,x0-pi,x0+pi,[],z,w,ntype);
% X=real(X); % truncate some roundoff
X=mod(X+pi,2*pi)-pi; % reduce to central period
ma=X*hp/pi; % scale to half period hp
return
%%%%%%
function d2=meritfcn(x,z,w,ntype)
x=exp(i*x);
if ntype=='F'
y=x-z;
else % ntype=='R'
y=log(x'*z);
end
d2=y'*diag(w)*y;
return
%%%%%%
% % test script
% %
% % NOTA BENE: meanangleoct(a,[],[],'R') will equal mean(a)
% % when all abs(a-b) < pi/2 for some value b
% %
% na=3, a=sort(mod(randn(1,na)+1,2)-1)*pi;
% da=diff([a a(1)+2*pi]); [mda,ndx]=min(da);
% a=circshift(a,[0 2-ndx]) % so that diff(a(2:3)) is smallest
% A=exp(i*a), B1=expm(a(1)*[0 -1; 1 0]),
% B2=expm(a(2)*[0 -1; 1 0]), B3=expm(a(3)*[0 -1; 1 0]),
% masimpl=[angle(mean(exp(i*a))) mean(a)]
% Bsum=B1+B2+B3; BmeanF=Bsum/sqrt(det(Bsum));
% % this expression for BmeanR should be correct for ordering of a above
% BmeanR=B1*(B1'*B2*(B2'*B3)^(1/2))^(2/3);
% mamtrx=real([[0 1]*logm(BmeanF)*[1 0]' [0 1]*logm(BmeanR)*[1 0]'])
% manorm=[meanangleoct(a,[],[],'F') meanangleoct(a,[],[],'R')]
% polar(a,1+0*a,'b*'), axis square, hold on
% polar(manorm(1),1,'rs'), polar(manorm(2),1,'gd'), hold off
% Meanangleoct Version 1.0
% Copyright (C) 2011 Alphawave Research, robjohnson#alphawaveresearch.com
% Released under GNU GPLv3 -- see file COPYING for more info.
%
% Meanangle is free software: you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation, either version 3 of the License, or (at
% your option) any later version.
%
% Meanangle is distributed in the hope that it will be useful, but
% WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
% General Public License for more details.
%
% You should have received a copy of the GNU General Public License
% along with this program. If not, see `http://www.gnu.org/licenses/'.
In python, with angles between [-180, 180)
def add_angles(a, b):
return (a + b + 180) % 360 - 180
def average_angles(a, b):
return add_angles(a, add_angles(-a, b)/2)
Details:
For the average of two angles there are two averages 180° apart, but we may want the closer average.
Visually, the average of the blue (b) and green (a) the yields the teal point:
Angles 'wrap around' (e.g. 355 + 10 = 5), but standard arithmetic will ignore this branch point.
However if angle b is opposite to the branch point, then (b + g)/2 gives the closest average: the teal point.
For any two angles, we can rotate the problem so one of the angles is opposite to the branch point, perform standard averaging, then rotate back.
Here is the full solution:
(the input is an array of bearing in degrees (0-360)
public static int getAvarageBearing(int[] arr)
{
double sunSin = 0;
double sunCos = 0;
int counter = 0;
for (double bearing : arr)
{
bearing *= Math.PI/180;
sunSin += Math.sin(bearing);
sunCos += Math.cos(bearing);
counter++;
}
int avBearing = INVALID_ANGLE_VALUE;
if (counter > 0)
{
double bearingInRad = Math.atan2(sunSin/counter, sunCos/counter);
avBearing = (int) (bearingInRad*180f/Math.PI);
if (avBearing<0)
avBearing += 360;
}
return avBearing;
}
In english:
Make a second data set with all angles shifted by 180.
Take the variance of both data sets.
Take the average of the data set with the smallest variance.
If this average is from the shifted set then shift the answer again by 180.
In python:
A #numpy NX1 array of angles
if np.var(A) < np.var((A-180)%360):
average = np.average(A)
else:
average = (np.average((A-180)%360)+180)%360
If anyone is looking for a JavaScript solution to this, I've translated the example given in the wikipedia page Mean of circular quantities (which was also referred to in Nick's answer) into JavaScript/NodeJS code, with help from the mathjs library.
If your angles are in degrees:
const maths = require('mathjs');
getAverageDegrees = (array) => {
let arrayLength = array.length;
let sinTotal = 0;
let cosTotal = 0;
for (let i = 0; i < arrayLength; i++) {
sinTotal += maths.sin(array[i] * (maths.pi / 180));
cosTotal += maths.cos(array[i] * (maths.pi / 180));
}
let averageDirection = maths.atan(sinTotal / cosTotal) * (180 / maths.pi);
if (cosTotal < 0) {
averageDirection += 180;
} else if (sinTotal < 0) {
averageDirection += 360;
}
return averageDirection;
}
This solution worked really well for me in order to find the average direction from a set of compass directions. I've tested this on a large range of directional data (0-360 degrees) and it seems very robust.
Alternatively, if your angles are in radians:
const maths = require('mathjs');
getAverageRadians = (array) => {
let arrayLength = array.length;
let sinTotal = 0;
let cosTotal = 0;
for (let i = 0; i < arrayLength; i++) {
sinTotal += maths.sin(array[i]);
cosTotal += maths.cos(array[i]);
}
let averageDirection = maths.atan(sinTotal / cosTotal);
if (cosTotal < 0) {
averageDirection += 180;
} else if (sinTotal < 0) {
averageDirection += 360;
}
return averageDirection;
}
Hopefully these solutions are helpful to someone facing a similar programming challenge to me.
I would go the vector way using complex numbers. My example is in Python, which has built-in complex numbers:
import cmath # complex math
def average_angle(list_of_angles):
# make a new list of vectors
vectors= [cmath.rect(1, angle) # length 1 for each vector
for angle in list_of_angles]
vector_sum= sum(vectors)
# no need to average, we don't care for the modulus
return cmath.phase(vector_sum)
Note that Python does not need to build a temporary new list of vectors, all of the above can be done in one step; I just chose this way to approximate pseudo-code applicable to other languages too.
Here's a complete C++ solution:
#include <vector>
#include <cmath>
double dAngleAvg(const vector<double>& angles) {
auto avgSin = double{ 0.0 };
auto avgCos = double{ 0.0 };
static const auto conv = double{ 0.01745329251994 }; // PI / 180
static const auto i_conv = double{ 57.2957795130823 }; // 180 / PI
for (const auto& theta : angles) {
avgSin += sin(theta*conv);
avgCos += cos(theta*conv);
}
avgSin /= (double)angles.size();
avgCos /= (double)angles.size();
auto ret = double{ 90.0 - atan2(avgCos, avgSin) * i_conv };
if (ret<0.0) ret += 360.0;
return fmod(ret, 360.0);
}
It takes the angles in the form of a vector of doubles, and returns the average simply as a double. The angles must be in degrees, and of course the average is in degrees as well.
Based on Alnitak's answer, I've written a Java method for calculating the average of multiple angles:
If your angles are in radians:
public static double averageAngleRadians(double... angles) {
double x = 0;
double y = 0;
for (double a : angles) {
x += Math.cos(a);
y += Math.sin(a);
}
return Math.atan2(y, x);
}
If your angles are in degrees:
public static double averageAngleDegrees(double... angles) {
double x = 0;
double y = 0;
for (double a : angles) {
x += Math.cos(Math.toRadians(a));
y += Math.sin(Math.toRadians(a));
}
return Math.toDegrees(Math.atan2(y, x));
}
Here's an idea: build the average iteratively by always calculating the average of the angles that are closest together, keeping a weight.
Another idea: find the largest gap between the given angles. Find the point that bisects it, and then pick the opposite point on the circle as the reference zero to calculate the average from.
Let's represent these angles with points on the circumference of the circle.
Can we assume that all these points fall on the same half of the circle? (Otherwise, there is no obvious way to define the "average angle". Think of two points on the diameter, e.g. 0 deg and 180 deg --- is the average 90 deg or 270 deg? What happens when we have 3 or more evenly spread out points?)
With this assumption, we pick an arbitrary point on that semicircle as the "origin", and measure the given set of angles with respect to this origin (call this the "relative angle"). Note that the relative angle has an absolute value strictly less than 180 deg. Finally, take the mean of these relative angles to get the desired average angle (relative to our origin of course).
There's no single "right answer". I recommend reading the book,
K. V. Mardia and P. E. Jupp, "Directional Statistics", (Wiley, 1999),
for a thorough analysis.
(Just want to share my viewpoint from Estimation Theory or Statistical Inference)
Nimble's trial is to get the MMSE^ estimate of a set of angles, but it's one of choices to find an "averaged" direction; one can also find an MMAE^ estimate, or some other estimate to be the "averaged" direction, and it depends on your metric quantifying error of direction; or more generally in estimation theory, the definition of cost function.
^ MMSE/MMAE corresponds to minimum mean squared/absolute error.
ackb said "The average angle phi_avg should have the property that sum_i|phi_avg-phi_i|^2 becomes minimal...they average something, but not angles"
---- you quantify errors in mean-squared sense and it's one of the mostly common way, however, not the only way. The answer favored by most people here (i.e., sum of the unit vectors and get the angle of the result) is actually one of the reasonable solutions. It is (can be proved) the ML estimator that serves as the "averaged" direction we want, if the directions of the vectors are modeled as von Mises distribution. This distribution is not fancy, and is just a periodically sampled distribution from a 2D Guassian. See Eqn. (2.179) in Bishop's book "Pattern Recognition and Machine Learning". Again, by no means it's the only best one to represent "average" direction, however, it is quite reasonable one that have both good theoretical justification and simple implementation.
Nimble said "ackb is right that these vector based solutions cannot be considered true averages of angles, they are only an average of the unit vector counterparts"
----this is not true. The "unit vector counterparts" reveals the information of the direction of a vector. The angle is a quantity without considering the length of the vector, and the unit vector is something with additional information that the length is 1. You can define your "unit" vector to be of length 2, it does not really matter.
You can see a solution and a little explanation in the following link, for ANY programming language:
https://rosettacode.org/wiki/Averages/Mean_angle
For instance, C++ solution:
#include<math.h>
#include<stdio.h>
double
meanAngle (double *angles, int size)
{
double y_part = 0, x_part = 0;
int i;
for (i = 0; i < size; i++)
{
x_part += cos (angles[i] * M_PI / 180);
y_part += sin (angles[i] * M_PI / 180);
}
return atan2 (y_part / size, x_part / size) * 180 / M_PI;
}
int
main ()
{
double angleSet1[] = { 350, 10 };
double angleSet2[] = { 90, 180, 270, 360};
double angleSet3[] = { 10, 20, 30};
printf ("\nMean Angle for 1st set : %lf degrees", meanAngle (angleSet1, 2));
printf ("\nMean Angle for 2nd set : %lf degrees", meanAngle (angleSet2, 4));
printf ("\nMean Angle for 3rd set : %lf degrees\n", meanAngle (angleSet3, 3));
return 0;
}
Output:
Mean Angle for 1st set : -0.000000 degrees
Mean Angle for 2nd set : -90.000000 degrees
Mean Angle for 3rd set : 20.000000 degrees
Or Matlab solution:
function u = mean_angle(phi)
u = angle(mean(exp(i*pi*phi/180)))*180/pi;
end
mean_angle([350, 10])
ans = -2.7452e-14
mean_angle([90, 180, 270, 360])
ans = -90
mean_angle([10, 20, 30])
ans = 20.000
Here is a completely arithmetic solution using moving averages and taking care to normalize values. It is fast and delivers correct answers if all angles are on one side of the circle (within 180° of each other).
It is mathimatically equivalent to adding the offset which shifts the values into the range (0, 180), calulating the mean and then subtracting the offset.
The comments describe what range a specific value can take on at any given time
// angles have to be in the range [0, 360) and within 180° of each other.
// n >= 1
// returns the circular average of the angles int the range [0, 360).
double meanAngle(double* angles, int n)
{
double average = angles[0];
for (int i = 1; i<n; i++)
{
// average: (0, 360)
double diff = angles[i]-average;
// diff: (-540, 540)
if (diff < -180)
diff += 360;
else if (diff >= 180)
diff -= 360;
// diff: (-180, 180)
average += diff/(i+1);
// average: (-180, 540)
if (average < 0)
average += 360;
else if (average >= 360)
average -= 360;
// average: (0, 360)
}
return average;
}
Well I'm hugely late to the party but thought I'd add my 2 cents worth as I couldn't really find any definitive answer. In the end I implemented the following Java version of the Mitsuta method which, I hope, provides a simple and robust solution. Particularly as the Standard Deviation provides both a measure dispersion and, if sd == 90, indicates that the input angles result in an ambiguous mean.
EDIT: Actually I realised that my original implementation can be even further simplified, in fact worryingly simple considering all the conversation and trigonometry going on in the other answers.
/**
* The Mitsuta method
*
* #param angles Angles from 0 - 360
* #return double array containing
* 0 - mean
* 1 - sd: a measure of angular dispersion, in the range [0..360], similar to standard deviation.
* Note if sd == 90 then the mean can also be its inverse, i.e. 360 == 0, 300 == 60.
*/
public static double[] getAngleStatsMitsuta(double... angles) {
double sum = 0;
double sumsq = 0;
for (double angle : angles) {
if (angle >= 180) {
angle -= 360;
}
sum += angle;
sumsq += angle * angle;
}
double mean = sum / angles.length;
return new double[]{mean <= 0 ? 360 + mean: mean, Math.sqrt(sumsq / angles.length - (mean * mean))};
}
... and for all you (Java) geeks out there, you can use the above approach to get the mean angle in one line.
Arrays.stream(angles).map(angle -> angle<180 ? angle: (angle-360)).sum() / angles.length;
Alnitak has the right solution. Nick Fortescue's solution is functionally the same.
For the special case of where
( sum(x_component) = 0.0 && sum(y_component) = 0.0 ) // e.g. 2 angles of 10. and 190. degrees ea.
use 0.0 degrees as the sum
Computationally you have to test for this case since atan2(0. , 0.) is undefined and will generate an error.
The average angle phi_avg should have the property that sum_i|phi_avg-phi_i|^2 becomes minimal, where the difference has to be in [-Pi, Pi) (because it might be shorter to go the other way around!). This is easily achieved by normalizing all input values to [0, 2Pi), keeping a running average phi_run and choosing normalizing |phi_i-phi_run| to [-Pi,Pi)
(by adding or subtractin 2Pi). Most suggestions above do something else that does not
have that minimal property, i.e., they average something, but not angles.
I solved the problem with the help of the answer from #David_Hanak.
As he states:
The angle that points "between" the two others while staying in the same semicircle, e.g. for 355 and 5, this would be 0, not 180. To do this, you need to check if the difference between the two angles is larger than 180 or not. If so, increment the smaller angle by 360 before using the above formula.
So what I did was calculate the average of all the angles. And then all the angles that are less than this, increase them by 360. Then recalculate the average by adding them all and dividing them by their length.
float angleY = 0f;
int count = eulerAngles.Count;
for (byte i = 0; i < count; i++)
angleY += eulerAngles[i].y;
float averageAngle = angleY / count;
angleY = 0f;
for (byte i = 0; i < count; i++)
{
float angle = eulerAngles[i].y;
if (angle < averageAngle)
angle += 360f;
angleY += angle;
}
angleY = angleY / count;
Works perfectly.
Python function:
from math import sin,cos,atan2,pi
import numpy as np
def meanangle(angles,weights=0,setting='degrees'):
'''computes the mean angle'''
if weights==0:
weights=np.ones(len(angles))
sumsin=0
sumcos=0
if setting=='degrees':
angles=np.array(angles)*pi/180
for i in range(len(angles)):
sumsin+=weights[i]/sum(weights)*sin(angles[i])
sumcos+=weights[i]/sum(weights)*cos(angles[i])
average=atan2(sumsin,sumcos)
if setting=='degrees':
average=average*180/pi
return average
You can use this function in Matlab:
function retVal=DegreeAngleMean(x)
len=length(x);
sum1=0;
sum2=0;
count1=0;
count2=0;
for i=1:len
if x(i)<180
sum1=sum1+x(i);
count1=count1+1;
else
sum2=sum2+x(i);
count2=count2+1;
end
end
if (count1>0)
k1=sum1/count1;
end
if (count2>0)
k2=sum2/count2;
end
if count1>0 && count2>0
if(k2-k1 >= 180)
retVal = ((sum1+sum2)-count2*360)/len;
else
retVal = (sum1+sum2)/len;
end
elseif count1>0
retVal = k1;
else
retVal = k2;
end
While starblue's answer gives the angle of the average unit vector, it is possible to extend the concept of the arithmetic mean to angles if you accept that there may be more than one answer in the range of 0 to 2*pi (or 0° to 360°). For example, the average of 0° and 180° may be either 90° or 270°.
The arithmetic mean has the property of being the single value with the minimum sum of squared distances to the input values. The distance along the unit circle between two unit vectors can be easily calculated as the inverse cosine of their dot product. If we choose a unit vector by minimizing the sum of the squared inverse cosine of the dot product of our vector and each input unit vector then we have an equivalent average. Again, keep in mind that there may be two or more minimums in exceptional cases.
This concept could be extended to any number of dimensions, since the distance along the unit sphere can be calculated in the exact same way as the distance along the unit circle--the inverse cosine of the dot product of two unit vectors.
For circles we could solve for this average in a number of ways, but I propose the following O(n^2) algorithm (angles are in radians, and I avoid calculating the unit vectors):
var bestAverage = -1
double minimumSquareDistance
for each a1 in input
var sumA = 0;
for each a2 in input
var a = (a2 - a1) mod (2*pi) + a1
sumA += a
end for
var averageHere = sumA / input.count
var sumSqDistHere = 0
for each a2 in input
var dist = (a2 - averageHere + pi) mod (2*pi) - pi // keep within range of -pi to pi
sumSqDistHere += dist * dist
end for
if (bestAverage < 0 OR sumSqDistHere < minimumSquareDistance) // for exceptional cases, sumSqDistHere may be equal to minimumSquareDistance at least once. In these cases we will only find one of the averages
minimumSquareDistance = sumSqDistHere
bestAverage = averageHere
end if
end for
return bestAverage
If all the angles are within 180° of each other, then we could use a simpler O(n)+O(sort) algorithm (again using radians and avoiding use of unit vectors):
sort(input)
var largestGapEnd = input[0]
var largestGapSize = (input[0] - input[input.count-1]) mod (2*pi)
for (int i = 1; i < input.count; ++i)
var gapSize = (input[i] - input[i - 1]) mod (2*pi)
if (largestGapEnd < 0 OR gapSize > largestGapSize)
largestGapSize = gapSize
largestGapEnd = input[i]
end if
end for
double sum = 0
for each angle in input
var a2 = (angle - largestGapEnd) mod (2*pi) + largestGapEnd
sum += a2
end for
return sum / input.count
To use degrees, simply replace pi with 180. If you plan to use more dimensions then you will most likely have to use an iterative method to solve for the average.
The problem is extremely simple.
1. Make sure all angles are between -180 and 180 degrees.
2. a Add all non-negative angles, take their average, and COUNT how many
2. b.Add all negative angles, take their average and COUNT how many.
3. Take the difference of pos_average minus neg_average
If difference is greater than 180 then change difference to 360 minus difference. Otherwise just change the sign of difference. Note that difference is always non-negative.
The Average_Angle equals the pos_average plus difference times the "weight", negative count divided by the sum of negative and positive count
Here is some java code to average angles, I think it's reasonably robust.
public static double getAverageAngle(List<Double> angles)
{
// r = right (0 to 180 degrees)
// l = left (180 to 360 degrees)
double rTotal = 0;
double lTotal = 0;
double rCtr = 0;
double lCtr = 0;
for (Double angle : angles)
{
double norm = normalize(angle);
if (norm >= 180)
{
lTotal += norm;
lCtr++;
} else
{
rTotal += norm;
rCtr++;
}
}
double rAvg = rTotal / Math.max(rCtr, 1.0);
double lAvg = lTotal / Math.max(lCtr, 1.0);
if (rAvg > lAvg + 180)
{
lAvg += 360;
}
if (lAvg > rAvg + 180)
{
rAvg += 360;
}
double rPortion = rAvg * (rCtr / (rCtr + lCtr));
double lPortion = lAvg * (lCtr / (lCtr + rCtr));
return normalize(rPortion + lPortion);
}
public static double normalize(double angle)
{
double result = angle;
if (angle >= 360)
{
result = angle % 360;
}
if (angle < 0)
{
result = 360 + (angle % 360);
}
return result;
}

Resources