Dataset for meaningless “Nearest Neighbor”?

Dataset for meaningless “Nearest Neighbor”? - algorithm

In the paper "When Is 'Nearest Neighbor' Meaningful?" we read that, "We show that under certain broad conditions (in terms of data and query distributions, or workload), as dimensionality increases, the distance to the nearest
neighbor approaches the distance to the farthest neighbor. In other words, the contrast in distances to different data points becomes nonexistent. The conditions
we have identified in which this happens are much broader than the independent and identically distributed (IID) dimensions assumption that other work
assumes."
My question is how I should generate a dataset that resembles this effect? I have created three points each with 1000 dimensions with random numbers ranging from 0-255 for each dimension but points create different distances and do not reproduce what is mentioned above. It seems changing dimensions (e.g. 10 or 100 or 1000 dimensions) and ranges (e.g. [0,1]) do not change anything. I still get different distances which should not be any problem for e.g. clustering algorithms!

I hadn't heard of this before either, so I am little defensive, since I have seen that real and synthetic datasets in high dimensions really do not support the claim of the paper in question.
As a result, what I would suggest, as a first, dirty, clumsy and maybe not good first attempt is to generate a sphere in a dimension of your choice (I do it like like this) and then place a query at the center of the sphere.
In that case, every point lies in the same distance with the query point, thus the Nearest Neighbor has a distance equal to the Farthest Neighbor.
This, of course, is independent from the dimension, but it's what came at a thought after looking at the figures of the paper. It should be enough to get you stared, but surely, better datasets may be generated, if any.
Edit about:
distances for each point got bigger with more dimensions!!!!
this is expected, since the higher the dimensional space, the sparser the space is, thus the greater the distance is. Moreover, this is expected, if you think for example, the Euclidean distance, which gets grater as the dimensions grow.

I think the paper is right. First, your test: One problem with your test may be that you are using too few points. I used 10000 point and below are my results (evenly distributed point in [0.0 ... 1.0] in all dimensions). For DIM=2, min/max differ almost by a factor of 1000, for DIM=1000, they only differ by a factor of 1.6, for DIM=10000 by 1.248 . So I'd say these results confirm the paper's hypothesis.
DIM/N = 2 / 10000
min/avg/max= 1.0150906548224441E-5 / 0.019347838262624064 / 0.9993862941797146
DIM/N = 10 / 10000.0
min/avg/max= 0.011363500131326938 / 0.9806472676701363 / 1.628460468042207
DIM/N = 100 / 10000
min/avg/max= 0.7701271349716637 / 1.3380320375218808 / 2.1878136533925328
DIM/N = 1000 / 10000
min/avg/max= 2.581913326565635 / 3.2871335447262178 / 4.177669393187736
DIM/N = 10000 / 10000
min/avg/max= 8.704666143050158 / 9.70540814778645 / 10.85760200249862
DIM/N = 100000 / 1000 (N=1000!)
min/avg/max= 30.448610133282717 / 31.14936583713578 / 31.99082677476165
I guess the explanation is as follows: Lets take three randomly generated vectors, A, B and C. The total distance is based on the sum of the distances of each individual row of these vectors. The more dimensions the vectors have, the more the total sum of differences will approach a common average. In other word, it is highly unlikely that a vector C has in all elements a larger distance to A than another vector B has to A. With increasing dimensions, C and B will have increasingly similar distance to A (and to each other).
My test dataset was created as follow. The dataset is essentially a cube ranging from 0.0 to 1.0 in every dimension. The coordinates were created with uniform distribution in every dimension between 0.0 and 1.0. Example code (N=10000, DIM=[2..10000]):
public double[] generate(int N, int DIM) {
double[] data = new double[N*DIM];
for (int i = 0; i < N; i++) {
int pos = DIM*i;
for (int d = 0; d < DIM; d++) {
data[pos+d] = R.nextDouble();
}
}
return data;
}
Following the equation given at the bottom of the accepted answer here, we get:
d=2 -> 98460
d=10 -> 142.3
d=100 -> 1.84
d=1,000 -> 0.618
d=10,000 -> 0.247
d=100,000 -> 0.0506 (using N=1000)

Related

In a restricted space with n dimension, how to find the coordinates of p points, so that they are as far as possible from each other?

For example, in a 2D space, with x [0 ; 1] and y [0 ; 1]. For p = 4, intuitively, I will place each point at each corner of the square.
But what can be the general algorithm?

Edit: The algorithm needs modification if dimensions are not orthogonal to eachother
To uniformly place the points as described in your example you could do something like this:
var combinedSize = 0
for each dimension d in d0..dn {
combinedSize += d.length;
}
val listOfDistancesBetweenPointsAlongEachDimension = new List
for each d dimension d0..dn {
val percentageOfWholeDimensionSize = d.length/combinedSize
val pointsToPlaceAlongThisDimension = percentageOfWholeDimensionSize * numberOfPoints
listOfDistancesBetweenPointsAlongEachDimension[d.index] = d.length/(pointsToPlaceAlongThisDimension - 1)
}
Run on your example it gives:
combinedSize = 2
percentageOfWholeDimensionSize = 1 / 2
pointsToPlaceAlongThisDimension = 0.5 * 4
listOfDistancesBetweenPointsAlongEachDimension[0] = 1 / (2 - 1)
listOfDistancesBetweenPointsAlongEachDimension[1] = 1 / (2 - 1)
note: The minus 1 deals with the inclusive interval, allowing points at both endpoints of the dimension

2D case
In 2D (n=2) the solution is to place your p points evenly on some circle. If you want also to define the distance d between points then the circle should have radius around:
2*Pi*r = ~p*d
r = ~(p*d)/(2*Pi)
To be more precise you should use circumference of regular p-point polygon instead of circle circumference (I am too lazy to do that). Or you can compute the distance of produced points and scale up/down as needed instead.
So each point p(i) can be defined as:
p(i).x = r*cos((i*2.0*Pi)/p)
p(i).y = r*sin((i*2.0*Pi)/p)
3D case
Just use sphere instead of circle.
ND case
Use ND hypersphere instead of circle.
So your question boils down to place p "equidistant" points to a n-D hypersphere (either surface or volume). As you can see 2D case is simple, but in 3D this starts to be a problem. See:
Make a sphere with equidistant vertices
sphere subdivision triangulation
As you can see there are quite a few approaches to do this (there are much more of them even using Fibonacci sequence generated spiral) which are more or less hard to grasp or implement.
However If you want to generalize this into ND space you need to chose general approach. I would try to do something like this:
Place p uniformly distributed place inside bounding hypersphere
each point should have position,velocity and acceleration vectors. You can also place the points randomly (just ensure none are at the same position)...
For each p compute acceleration
each p should retract any other point (opposite of gravity).
update position
just do a Newton D'Alembert physics simulation in ND. Do not forget to include some dampening of speed so the simulation will stop in time. Bound the position and speed to the sphere so points will not cross it's border nor they would reflect the speed inwards.
loop #2 until max speed of any p crosses some threshold
This will more or less accurately place p points on the circumference of ND hypersphere. So you got minimal distance d between them. If you got some special dependency between n and p then there might be better configurations then this but for arbitrary numbers I think this approach should be safe enough.
Now by modifying #2 rules you can achieve 2 different outcomes. One filling hypersphere surface (by placing massive negative mass into center of surface) and second filling its volume. For these two options also the radius will be different. For one you need to use surface and for the other volume...
Here example of similar simulation used to solve a geometry problem:
How to implement a constraint solver for 2-D geometry?
Here preview of 3D surface case:
The number on top is the max abs speed of particles used to determine the simulations stopped and the white-ish lines are speed vectors. You need to carefully select the acceleration and dampening coefficients so the simulation is fast ...

Algorithm to smooth a curve while keeping the area under it constant

Consider a discrete curve defined by the points (x1,y1), (x2,y2), (x3,y3), ... ,(xn,yn)
Define a constant SUM = y1+y2+y3+...+yn. Say we change the value of some k number of y points (increase or decrease) such that the total sum of these changed points is less than or equal to the constant SUM.
What would be the best possible manner to adjust the other y points given the following two conditions:
The total sum of the y points (y1'+y2'+...+yn') should remain constant ie, SUM.
The curve should retain as much of its original shape as possible.
A simple solution would be to define some delta as follows:
delta = (ym1' + ym2' + ym3' + ... + ymk') - (ym1 + ym2 + ym3 + ... + ymk')
and to distribute this delta over the rest of the points equally. Here ym1' is the value of the modified point after modification and ym1 is the value of the modified point before modification to give delta as the total difference in modification.
However this would not ensure a totally smoothed curve as area near changed points would appear ragged. Does a better solution/algorithm exist for the this problem?

I've used the following approach, though it is a bit OTT.
Consider adding d[i] to y[i], to get s[i], the smoothed value.
We seek to minimise
S = Sum{ 1<=i<N-1 | sqr( s[i+1]-2*s[i]+s[i-1] } + f*Sum{ 0<=i<N | sqr( d[i])}
The first term is a sum of the squares of (an approximate) second derivative of the curve, and the second term penalises moving away from the original. f is a (positive) constant. A little algebra recasts this as
S = sqr( ||A*d - b||)
where the matrix A has a nice structure, and indeed A'*A is penta-diagonal, which means that the normal equations (ie d = Inv(A'*A)*A'*b) can be solved efficiently. Note that d is computed directly, there is no need to initialise it.
Given the solution d to this problem we can compute the solution d^ to the same problem but with the constraint One'*d = 0 (where One is the vector of all ones) like this
d^ = d - (One'*d/Q) * e
e = Inv(A'*A)*One
Q = One'*e
What value to use for f? Well a simple approach is to try out this procedure on sample curves for various fs and pick a value that looks good. Another approach is to pick a estimate of smoothness, for example the rms of the second derivative, and then a value that should attain, and then search for an f that gives that value. As a general rule, the bigger f is the less smooth the smoothed curve will be.
Some motivation for all this. The aim is to find a 'smooth' curve 'close' to a given one. For this we need a measure of smoothness (the first term in S) and a measure of closeness (the second term. Why these measures? Well, each are easy to compute, and each are quadratic in the variables (the d[]); this will mean that the problem becomes an instance of linear least squares for which there are efficient algorithms available. Moreover each term in each sum depends on nearby values of the variables, which will in turn mean that the 'inverse covariance' (A'*A) will have a banded structure and so the least squares problem can be solved efficiently. Why introduce f? Well, if we didn't have f (or set it to 0) we could minimise S by setting d[i] = -y[i], getting a perfectly smooth curve s[] = 0, which has nothing to do with the y curve. On the other hand if f is gigantic, then to minimise s we should concentrate on the second term, and set d[i] = 0, and our 'smoothed' curve is just the original. So it's reasonable to suppose that as we vary f, the corresponding solutions will vary between being very smooth but far from y (small f) and being close to y but a bit rough (large f).
It's often said that the normal equations, whose use I advocate here, are a bad way to solve least squares problems, and this is generally true. However with 'nice' banded systems -- like the one here -- the loss of stability through using the normal equations is not so great, while the gain in speed is so great. I've used this approach to smooth curves with many thousands of points in a reasonable time.
To see what A is, consider the case where we had 4 points. Then our expression for S comes down to:
sqr( s[2] - 2*s[1] + s[0]) + sqr( s[3] - 2*s[2] + s[1]) + f*(d[0]*d[0] + .. + d[3]*d[3]).
If we substitute s[i] = y[i] + d[i] in this we get, for example,
s[2] - 2*s[1] + s[0] = d[2]-2*d[1]+d[0] + y[2]-2*y[1]+y[0]
and so we see that for this to be sqr( ||A*d-b||) we should take
A = ( 1 -2 1 0)
( 0 1 -2 1)
( f 0 0 0)
( 0 f 0 0)
( 0 0 f 0)
( 0 0 0 f)
and
b = ( -(y[2]-2*y[1]+y[0]))
( -(y[3]-2*y[2]+y[1]))
( 0 )
( 0 )
( 0 )
( 0 )
In an implementation, though, you probably wouldn't want to form A and b, as they are only going to be used to form the normal equation terms, A'*A and A'*b. It would be simpler to accumulate these directly.

This is a constrained optimization problem. The functional to be minimized is the integrated difference of the original curve and the modified curve. The constraints are the area under the curve and the new locations of the modified points. It is not easy to write such codes on your own. It is better to use some open source optimization codes, like this one: ool.

what about to keep the same dynamic range?
compute original min0,max0 y-values
smooth y-values
compute new min1,max1 y-values
linear interpolate all values to match original min max y
y=min1+(y-min1)*(max0-min0)/(max1-min1)
that is it
Not sure for the area but this should keep the shape much closer to original one. I got this Idea right now while reading your question and now I face similar problem so I try to code it and try right now anyway +1 for the getting me this Idea :)
You can adapt this and combine with the area
So before this compute the area and apply #1..#4 and after that compute new area. Then multiply all values by old_area/new_area ratio. If you have also negative values and not computing absolute area then you have to handle positive and negative areas separately and find multiplication ration to best fit original area for booth at once.
[edit1] some results for constant dynamic range
As you can see the shape is slightly shifting to the left. Each image is after applying few hundreds smooth operations. I am thinking of subdivision to local min max intervals to improve this ...
[edit2] have finished the filter for mine own purposes
void advanced_smooth(double *p,int n)
{
int i,j,i0,i1;
double a0,a1,b0,b1,dp,w0,w1;
double *p0,*p1,*w; int *q;
if (n<3) return;
p0=new double[n<<2]; if (p0==NULL) return;
p1=p0+n;
w =p1+n;
q =(int*)((double*)(w+n));
// compute original min,max
for (a0=p[0],i=0;i<n;i++) if (a0>p[i]) a0=p[i];
for (a1=p[0],i=0;i<n;i++) if (a1<p[i]) a1=p[i];
for (i=0;i<n;i++) p0[i]=p[i]; // store original values for range restoration
// compute local min max positions to p1[]
dp=0.01*(a1-a0); // min delta treshold
// compute first derivation
p1[0]=0.0; for (i=1;i<n;i++) p1[i]=p[i]-p[i-1];
for (i=1;i<n-1;i++) // eliminate glitches
if (p1[i]*p1[i-1]<0.0)
if (p1[i]*p1[i+1]<0.0)
if (fabs(p1[i])<=dp)
p1[i]=0.5*(p1[i-1]+p1[i+1]);
for (i0=1;i0;) // remove zeros from derivation
for (i0=0,i=0;i<n;i++)
if (fabs(p1[i])<dp)
{
if ((i> 0)&&(fabs(p1[i-1])>=dp)) { i0=1; p1[i]=p1[i-1]; }
else if ((i<n-1)&&(fabs(p1[i+1])>=dp)) { i0=1; p1[i]=p1[i+1]; }
}
// find local min,max to q[]
q[n-2]=0; q[n-1]=0; for (i=1;i<n-1;i++) if (p1[i]*p1[i-1]<0.0) q[i-1]=1; else q[i-1]=0;
for (i=0;i<n;i++) // set sign as +max,-min
if ((q[i])&&(p1[i]<-dp)) q[i]=-q[i]; // this shifts smooth curve to the left !!!
// compute weights
for (i0=0,i1=1;i1<n;i0=i1,i1++) // loop through all local min,max intervals
{
for (;(!q[i1])&&(i1<n-1);i1++); // <i0,i1>
b0=0.5*(p[i0]+p[i1]);
b1=fabs(p[i1]-p[i0]);
if (b1>=1e-6)
for (b1=0.35/b1,i=i0;i<=i1;i++) // compute weights bigger near local min max
w[i]=0.8+(fabs(p[i]-b0)*b1);
}
// smooth few times
for (j=0;j<5;j++)
{
for (i=0;i<n ;i++) p1[i]=p[i]; // store data to avoid shifting by using half filtered data
for (i=1;i<n-1;i++) // FIR smooth filter
{
w0=w[i];
w1=(1.0-w0)*0.5;
p[i]=(w1*p1[i-1])+(w0*p1[i])+(w1*p1[i+1]);
}
for (i=1;i<n-1;i++) // avoid local min,max shifting too much
{
if (q[i]>0) // local max
{
if (p[i]<p[i-1]) p[i]=p[i-1]; // can not be lower then neigbours
if (p[i]<p[i+1]) p[i]=p[i+1];
}
if (q[i]<0) // local min
{
if (p[i]>p[i-1]) p[i]=p[i-1]; // can not be higher then neigbours
if (p[i]>p[i+1]) p[i]=p[i+1];
}
}
}
for (i0=0,i1=1;i1<n;i0=i1,i1++) // loop through all local min,max intervals
{
for (;(!q[i1])&&(i1<n-1);i1++); // <i0,i1>
// restore original local min,max
a0=p0[i0]; b0=p[i0];
a1=p0[i1]; b1=p[i1];
if (a0>a1)
{
dp=a0; a0=a1; a1=dp;
dp=b0; b0=b1; b1=dp;
}
b1-=b0;
if (b1>=1e-6)
for (dp=(a1-a0)/b1,i=i0;i<=i1;i++)
p[i]=a0+((p[i]-b0)*dp);
}
delete[] p0;
}
so p[n] is the input/output data. There are few things that can be tweaked like:
weights computation (constants 0.8 and 0.35 means weights are <0.8,0.8+0.35/2>)
number of smooth passes (now 5 in the for loop)
the bigger the weight the less the filtering 1.0 means no change
The main Idea behind is:
find local extremes
compute weights for smoothing
so near local extremes are almost none change of the output
smooth
repair dynamic range per each interval between all local extremes
[Notes]
I did also try to restore the area but that is incompatible with mine task because it distorts the shape a lot. So if you really need the area then focus on that and not on the shape. The smoothing causes signal to shrink mostly so after area restoration the shape rise on magnitude.
Actual filter state has none markable side shifting of shape (which was the main goal for me). Some images for more bumpy signal (the original filter was extremly poor on this):
As you can see no visible signal shape shifting. The local extremes has tendency to create sharp spikes after very heavy smoothing but that was expected
Hope it helps ...

Finding translation and scale on two sets of points to get least square error in their distance?

I have two sets of 3D points (original and reconstructed) and correspondence information about pairs - which point from one set represents the second one. I need to find 3D translation and scaling factor which transforms reconstruct set so the sum of square distances would be least (rotation would be nice too, but points are rotated similarly, so this is not main priority and might be omitted in sake of simplicity and speed). And so my question is - is this solved and available somewhere on the Internet? Personally, I would use least square method, but I don't have much time (and although I'm somewhat good at math, I don't use it often, so it would be better for me to avoid it), so I would like to use other's solution if it exists. I prefer solution in C++, for example using OpenCV, but algorithm alone is good enough.
If there is no such solution, I will calculate it by myself, I don't want to bother you so much.
SOLUTION: (from your answers)
For me it's Kabsch alhorithm;
Base info: http://en.wikipedia.org/wiki/Kabsch_algorithm
General solution: http://nghiaho.com/?page_id=671
STILL NOT SOLVED:
I also need scale. Scale values from SVD are not understandable for me; when I need scale about 1-4 for all axises (estimated by me), SVD scale is about [2000, 200, 20], which is not helping at all.

Since you are already using Kabsch algorithm, just have a look at Umeyama's paper which extends it to get scale. All you need to do is to get the standard deviation of your points and calculate scale as:
(1/sigma^2)*trace(D*S)
where D is the diagonal matrix in SVD decomposition in the rotation estimation and S is either identity matrix or [1 1 -1] diagonal matrix, depending on the sign of determinant of UV (which Kabsch uses to correct reflections into proper rotations). So if you have [2000, 200, 20], multiply the last element by +-1 (depending on the sign of determinant of UV), sum them and divide by the standard deviation of your points to get scale.
You can recycle the following code, which is using the Eigen library:
typedef Eigen::Matrix<double, 3, 1, Eigen::DontAlign> Vector3d_U; // microsoft's 32-bit compiler can't put Eigen::Vector3d inside a std::vector. for other compilers or for 64-bit, feel free to replace this by Eigen::Vector3d
/**
* #brief rigidly aligns two sets of poses
*
* This calculates such a relative pose <tt>R, t</tt>, such that:
*
* #code
* _TyVector v_pose = R * r_vertices[i] + t;
* double f_error = (r_tar_vertices[i] - v_pose).squaredNorm();
* #endcode
*
* The sum of squared errors in <tt>f_error</tt> for each <tt>i</tt> is minimized.
*
* #param[in] r_vertices is a set of vertices to be aligned
* #param[in] r_tar_vertices is a set of vertices to align to
*
* #return Returns a relative pose that rigidly aligns the two given sets of poses.
*
* #note This requires the two sets of poses to have the corresponding vertices stored under the same index.
*/
static std::pair<Eigen::Matrix3d, Eigen::Vector3d> t_Align_Points(
const std::vector<Vector3d_U> &r_vertices, const std::vector<Vector3d_U> &r_tar_vertices)
{
_ASSERTE(r_tar_vertices.size() == r_vertices.size());
const size_t n = r_vertices.size();
Eigen::Vector3d v_center_tar3 = Eigen::Vector3d::Zero(), v_center3 = Eigen::Vector3d::Zero();
for(size_t i = 0; i < n; ++ i) {
v_center_tar3 += r_tar_vertices[i];
v_center3 += r_vertices[i];
}
v_center_tar3 /= double(n);
v_center3 /= double(n);
// calculate centers of positions, potentially extend to 3D
double f_sd2_tar = 0, f_sd2 = 0; // only one of those is really needed
Eigen::Matrix3d t_cov = Eigen::Matrix3d::Zero();
for(size_t i = 0; i < n; ++ i) {
Eigen::Vector3d v_vert_i_tar = r_tar_vertices[i] - v_center_tar3;
Eigen::Vector3d v_vert_i = r_vertices[i] - v_center3;
// get both vertices
f_sd2 += v_vert_i.squaredNorm();
f_sd2_tar += v_vert_i_tar.squaredNorm();
// accumulate squared standard deviation (only one of those is really needed)
t_cov.noalias() += v_vert_i * v_vert_i_tar.transpose();
// accumulate covariance
}
// calculate the covariance matrix
Eigen::JacobiSVD<Eigen::Matrix3d> svd(t_cov, Eigen::ComputeFullU | Eigen::ComputeFullV);
// calculate the SVD
Eigen::Matrix3d R = svd.matrixV() * svd.matrixU().transpose();
// compute the rotation
double f_det = R.determinant();
Eigen::Vector3d e(1, 1, (f_det < 0)? -1 : 1);
// calculate determinant of V*U^T to disambiguate rotation sign
if(f_det < 0)
R.noalias() = svd.matrixV() * e.asDiagonal() * svd.matrixU().transpose();
// recompute the rotation part if the determinant was negative
R = Eigen::Quaterniond(R).normalized().toRotationMatrix();
// renormalize the rotation (not needed but gives slightly more orthogonal transformations)
double f_scale = svd.singularValues().dot(e) / f_sd2_tar;
double f_inv_scale = svd.singularValues().dot(e) / f_sd2; // only one of those is needed
// calculate the scale
R *= f_inv_scale;
// apply scale
Eigen::Vector3d t = v_center_tar3 - (R * v_center3); // R needs to contain scale here, otherwise the translation is wrong
// want to align center with ground truth
return std::make_pair(R, t); // or put it in a single 4x4 matrix if you like
}

For 3D points the problem is known as the Absolute Orientation problem. A c++ implementation is available from Eigen http://eigen.tuxfamily.org/dox/group__Geometry__Module.html#gab3f5a82a24490b936f8694cf8fef8e60 and paper http://web.stanford.edu/class/cs273/refs/umeyama.pdf
you can use it via opencv by converting the matrices to eigen with cv::cv2eigen() calls.

Start with translation of both sets of points. So that their centroid coincides with the origin of the coordinate system. Translation vector is just the difference between these centroids.
Now we have two sets of coordinates represented as matrices P and Q. One set of points may be obtained from other one by applying some linear operator (which performs both scaling and rotation). This operator is represented by 3x3 matrix X:
P * X = Q
To find proper scale/rotation we just need to solve this matrix equation, find X, then decompose it into several matrices, each representing some scaling or rotation.
A simple (but probably not numerically stable) way to solve it is to multiply both parts of the equation to the transposed matrix P (to get rid of non-square matrices), then multiply both parts of the equation to the inverted PT * P:
PT * P * X = PT * Q
X = (PT * P)-1 * PT * Q
Applying Singular value decomposition to matrix X gives two rotation matrices and a matrix with scale factors:
X = U * S * V
Here S is a diagonal matrix with scale factors (one scale for each coordinate), U and V are rotation matrices, one properly rotates the points so that they may be scaled along the coordinate axes, other one rotates them once more to align their orientation to second set of points.
Example (2D points are used for simplicity):
P = 1 2 Q = 7.5391 4.3455
2 3 12.9796 5.8897
-2 1 -4.5847 5.3159
-1 -6 -15.9340 -15.5511
After solving the equation:
X = 3.3417 -1.2573
2.0987 2.8014
After SVD decomposition:
U = -0.7317 -0.6816
-0.6816 0.7317
S = 4 0
0 3
V = -0.9689 -0.2474
-0.2474 0.9689
Here SVD has properly reconstructed all manipulations I performed on matrix P to get matrix Q: rotate by the angle 0.75, scale X axis by 4, scale Y axis by 3, rotate by the angle -0.25.
If sets of points are scaled uniformly (scale factor is equal by each axis), this procedure may be significantly simplified.
Just use Kabsch algorithm to get translation/rotation values. Then perform these translation and rotation (centroids should coincide with the origin of the coordinate system). Then for each pair of points (and for each coordinate) estimate Linear regression. Linear regression coefficient is exactly the scale factor.

A good explanation Finding optimal rotation and translation between corresponding 3D points
The code is in matlab but it's trivial to convert to opengl using the cv::SVD function

You might want to try ICP (Iterative closest point).
Given two sets of 3d points, it will tell you the transformation (rotation + translation) to go from the first set to the second one.
If you're interested in a c++ lightweight implementation, try libicp.
Good luck!

The general transformation, as well the scale can be retrieved via Procrustes Analysis. It works by superimposing the objects on top of each other and tries to estimate the transformation from that setting. It has been used in the context of ICP, many times. In fact, your preference, Kabash algorithm is a special case of this.
Moreover, Horn's alignment algorithm (based on quaternions) also finds a very good solution, while being quite efficient. A Matlab implementation is also available.

Scale can be inferred without SVD, if your points are uniformly scaled in all directions (I could not make sense of SVD-s scale matrix either). Here is how I solved the same problem:
Measure distances of each point to other points in the point cloud to get a 2d table of distances, where entry at (i,j) is norm(point_i-point_j). Do the same thing for the other point cloud, so you get two tables -- one for original and the other for reconstructed points.
Divide all values in one table by the corresponding values in the other table. Because the points correspond to each other, the distances do too. Ideally, the resulting table has all values being equal to each other, and this is the scale.
The median value of the divisions should be pretty close to the scale you are looking for. The mean value is also close, but I chose median just to exclude outliers.
Now you can use the scale value to scale all the reconstructed points and then proceed to estimating the rotation.
Tip: If there are too many points in the point clouds to find distances between all of them, then a smaller subset of distances will work, too, as long as it is the same subset for both point clouds. Ideally, just one distance pair would work if there is no measurement noise, e.g when one point cloud is directly derived from the other by just rotating it.

you can also use ScaleRatio ICP proposed by BaoweiLin
The code can be found in github

random unit vector in multi-dimensional space

I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space.
If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I get an even distribution across all possible directions?
I'm speaking only theoretically here since computer generated random numbers are not actually random.

One simple trick is to select each dimension from a gaussian distribution, then normalize:
from random import gauss
def make_rand_vector(dims):
vec = [gauss(0, 1) for i in range(dims)]
mag = sum(x**2 for x in vec) ** .5
return [x/mag for x in vec]
For example, if you want a 7-dimensional random vector, select 7 random values (from a Gaussian distribution with mean 0 and standard deviation 1). Then, compute the magnitude of the resulting vector using the Pythagorean formula (square each value, add the squares, and take the square root of the result). Finally, divide each value by the magnitude to obtain a normalized random vector.
If your number of dimensions is large then this has the strong benefit of always working immediately, while generating random vectors until you find one which happens to have magnitude less than one will cause your computer to simply hang at more than a dozen dimensions or so, because the probability of any of them qualifying becomes vanishingly small.

You will not get a uniformly distributed ensemble of angles with the algorithm you described. The angles will be biased toward the corners of your n-dimensional hypercube.
This can be fixed by eliminating any points with distance greater than 1 from the origin. Then you're dealing with a spherical rather than a cubical (n-dimensional) volume, and your set of angles should then be uniformly distributed over the sample space.
Pseudocode:
Let n be the number of dimensions, K the desired number of vectors:
vec_count=0
while vec_count < K
generate n uniformly distributed values a[0..n-1] over [-1, 1]
r_squared = sum over i=0,n-1 of a[i]^2
if 0 < r_squared <= 1.0
b[i] = a[i]/sqrt(r_squared) ; normalize to length of 1
add vector b[0..n-1] to output list
vec_count = vec_count + 1
else
reject this sample
end while

There is a boost implementation of the algorithm that samples from normal distributions: random::uniform_on_sphere

I had the exact same question when also developing a ML algorithm.
I got to the same conclusion as Jim Lewis after drawing samples for the 2-d case and plotting the resulting distribution of the angle.
Furthermore, if you try to derive the density distribution for the direction in 2d when you draw at random from [-1,1] for the x- and y-axis ,you will see that:
f_X(x) = 1/(4*cos²(x)) if 0 < x < 45⁰
and
f_X(x) = 1/(4*sin²(x)) if x > 45⁰
where x is the angle, and f_X is the probability density distribution.
I have written about this here:
https://aerodatablog.wordpress.com/2018/01/14/random-hyperplanes/

#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)
// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);
double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);

Averaging angles... Again

I want to calculate the average of a set of angles, which represents source bearing (0 to 360 deg) - (similar to wind-direction)
I know it has been discussed before (several times). The accepted answer was Compute unit vectors from the angles and take the angle of their average.
However this answer defines the average in a non intuitive way. The average of 0, 0 and 90 will be atan( (sin(0)+sin(0)+sin(90)) / (cos(0)+cos(0)+cos(90)) ) = atan(1/2)= 26.56 deg
I would expect the average of 0, 0 and 90 to be 30 degrees.
So I think it is fair to ask the question again: How would you calculate the average, so such examples will give the intuitive expected answer.
Edit 2014:
After asking this question, I've posted an article on CodeProject which offers a thorough analysis. The article examines the following reference problems:
Given time-of-day [00:00-24:00) for each birth occurred in US in the year 2000 - Calculate the mean birth time-of-day
Given a multiset of direction measurements from a stationary transmitter to a stationary receiver, using a measurement technique with a wrapped normal distributed error – Estimate the direction.
Given a multiset of azimuth estimates between two points, made by “ordinary” humans (assuming to subject to a wrapped truncated normal distributed error) – Estimate the direction.

[Note the OP's question (but not title) appears to have changed to a rather specialised question ("...the average of a SEQUENCE of angles where each successive addition does not differ from the running mean by more than a specified amount." ) - see #MaR comment and mine. My following answer addresses the OP's title and the bulk of the discussion and answers related to it.]
This is not a question of logic or intuition, but of definition. This has been discussed on SO before without any real consensus. Angles should be defined within a range (which might be -PI to +PI, or 0 to 2*PI or might be -Inf to +Inf. The answers will be different in each case.
The word "angle" causes confusion as it means different things. The angle of view is an unsigned quantity (and is normally PI > theta > 0. In that cases "normal" averages might be useful. Angle of rotation (e.g. total rotation if an ice skater) might or might not be signed and might include theta > 2PI and theta < -2PI.
What is defined here is angle = direction whihch requires vectors. If you use the word "direction" instead of "angle" you will have captured the OP's (apparent original) intention and it will help to move away from scalar quantities.
Wikipedia shows the correct approach when angles are defined circularly such that
theta = theta+2*PI*N = theta-2*PI*N
The answer for the mean is NOT a scalar but a vector. The OP may not feel this is intuitive but it is the only useful correct approach. We cannot redefine the square root of -4 to be -2 because it's more initutive - it has to be +-2*i. Similarly the average of bearings -90 degrees and +90 degrees is a vector of zero length, not 0.0 degrees.
Wikipedia (http://en.wikipedia.org/wiki/Mean_of_circular_quantities) has a special section and states (The equations are LaTeX and can be seen rendered in Wikipedia):
Most of the usual means fail on
circular quantities, like angles,
daytimes, fractional parts of real
numbers. For those quantities you need
a mean of circular quantities.
Since the arithmetic mean is not
effective for angles, the following
method can be used to obtain both a
mean value and measure for the
variance of the angles:
Convert all angles to corresponding
points on the unit circle, e.g., α to
(cosα,sinα). That is convert polar
coordinates to Cartesian coordinates.
Then compute the arithmetic mean of
these points. The resulting point will
lie on the unit disk. Convert that
point back to polar coordinates. The
angle is a reasonable mean of the
input angles. The resulting radius
will be 1 if all angles are equal. If
the angles are uniformly distributed
on the circle, then the resulting
radius will be 0, and there is no
circular mean. In other words, the
radius measures the concentration of
the angles.
Given the angles
\alpha_1,\dots,\alpha_n the mean is
computed by
M \alpha = \operatorname{atan2}\left(\frac{1}{n}\cdot\sum_{j=1}^n
\sin\alpha_j,
\frac{1}{n}\cdot\sum_{j=1}^n
\cos\alpha_j\right)
using the atan2 variant of the
arctangent function, or
M \alpha = \arg\left(\frac{1}{n}\cdot\sum_{j=1}^n
\exp(i\cdot\alpha_j)\right)
using complex numbers.
Note that in the OP's question an angle of 0 is purely arbitrary - there is nothing special about wind coming from 0 as opposed to 180 (except in this hemisphere it's colder on the bicycle). Try changing 0,0,90 to 289, 289, 379 and see how the simple arithmetic no longer works.
(There are some distributions where angles of 0 and PI have special significance but they are not in scope here).
Here are some intense previous discussions which mirror the current spread of views :-)
Link
How do you calculate the average of a set of circular data?
http://forums.xkcd.com/viewtopic.php?f=17&t=22435
http://www.allegro.cc/forums/thread/595008

Thank you all for helping me see my problem more clearly.
I found what I was looking for.
It is called Mitsuta method.
The inputs and output are in the range [0..360).
This method is good for averaging data that was sampled using constant sampling intervals.
The method assumes that the difference between successive samples is less than 180 degrees (which means that if we won't sample fast enough, a 330 degrees change in the sampled signal would be incorrectly detected as a 30 degrees change in the other direction and will insert an error into the calculation). Nyquist–Shannon sampling theorem anybody ?
Here is a c++ code:
double AngAvrg(const vector<double>& Ang)
{
vector<double>::const_iterator iter= Ang.begin();
double fD = *iter;
double fSigD= *iter;
while (++iter != Ang.end())
{
double fDelta= *iter - fD;
if (fDelta < -180.) fD+= fDelta + 360.;
else if (fDelta > 180.) fD+= fDelta - 360.;
else fD+= fDelta ;
fSigD+= fD;
}
double fAvrg= fSigD / Ang.size();
if (fAvrg >= 360.) return fAvrg -360.;
if (fAvrg < 0. ) return fAvrg +360.;
return fAvrg ;
}
It is explained on page 51 of Meteorological Monitoring Guidance for Regulatory Modeling Applications (PDF)(171 pp, 02-01-2000, 454-R-99-005)
Thank you MaR for sending the link as a comment.
If the sampled data is constant, but our sampling device has an inaccuracy with a Von Mises distribution, a unit-vectors calculation will be appropriate.

This is incorrect on every level.
Vectors add according to the rules of vector addition. The "intuitive, expected" answer might not be that intuitive.
Take the following example. If I have one unit vector (1, 0), with origin at (0,0) that points in the +x-direction and another (-1, 0) that also has its origin at (0,0) that points in the -x-direction, what should the "average" angle be?
If I simply add the angles and divide by two, I can argue that the "average" is either +90 or -90. Which one do you think it should be?
If I add the vectors according to the rules of vector addition (component by component), I get the following:
(1, 0) + (-1, 0) = (0, 0)
In polar coordinates, that's a vector with zero magnitude and angle zero.
So what should the "average" angle be? I've got three different answers here for a simple case.
I think the answer is that vectors don't obey the same intuition that numbers do, because they have both magnitude and direction. Maybe you should describe what problem you're solving a bit better.
Whatever solution you decide on, I'd advise you to base it on vectors. It'll always be correct that way.

What does it even mean to average source bearings? Start by answering that question, and you'll get closer to being to define what you mean by the average of angles.
In my mind, an angle with tangent equal to 1/2 is the right answer. If I have a unit force pushing me in the direction of the vector (1, 0), another force pushing me in the direction of the vector (1, 0) and third force pushing me in the direction of the vector (0, 1), then the resulting force (the sum of these forces) is the force pushing me in the direction of (1, 2). These the the vectors representing the bearings 0 degrees, 0 degrees and 90 degrees. The angle represented by the vector (1, 2) has tangent equal to 1/2.
Responding to your second edit:
Let's say that we are measuring wind direction. Our 3 measurements were 0, 0, and 90 degrees. Since all measurements are equivalently reliable, why shouldn't our best estimate of the wind direction be 30 degrees? setting it to 25.56 degrees is a bias toward 0...
Okay, here's an issue. The unit vector with angle 0 doesn't have the same mathematical properties that the real number 0 has. Using the notation 0v to represent the vector with angle 0, note that
0v + 0v = 0v
is false but
0 + 0 = 0
is true for real numbers. So if 0v represents wind with unit speed and angle 0, then 0v + 0v is wind with double unit speed and angle 0. And then if we have a third wind vector (which I'll representing using the notation 90v) which has angle 90 and unit speed, then the wind that results from the sum of these vectors does have a bias because it's traveling at twice unit speed in the horizontal direction but only unit speed in the vertical direction.

In my opinion, this is about angles, not vectors. For that reason the average of 360 and 0 is truly 180.
The average of one turn and no turns should be half a turn.

Edit: Equivalent, but more robust algorithm (and simpler):
divide angles into 2 groups, [0-180) and [180-360)
numerically average both groups
average the 2 group averages with proper weighting
if wraparound occurred, correct by 180˚
This works because number averaging works "logically" if all the angles are in the same hemicircle. We then delay getting wraparound error until the very last step, where it is easily detected and corrected. I also threw in some code for handling opposite angle cases. If the averages are opposite we favor the hemisphere that had more angles in it, and in the case of equal angles in both hemispheres we return None because no average would make sense.
The new code:
def averageAngles2(angles):
newAngles = [a % 360 for a in angles];
smallAngles = []
largeAngles = []
# split the angles into 2 groups: [0-180) and [180-360)
for angle in newAngles:
if angle < 180:
smallAngles.append(angle)
else:
largeAngles.append(angle)
smallCount = len(smallAngles)
largeCount = len(largeAngles)
#averaging each of the groups will work with standard averages
smallAverage = sum(smallAngles) / float(smallCount) if smallCount else 0
largeAverage = sum(largeAngles) / float(largeCount) if largeCount else 0
if smallCount == 0:
return largeAverage
if largeCount == 0:
return smallAverage
average = (smallAverage * smallCount + largeAverage * largeCount) / \
float(smallCount + largeCount)
if largeAverage < smallAverage + 180:
# average will not hit wraparound
return average
elif largeAverage > smallAverage + 180:
# average will hit wraparound, so will be off by 180 degrees
return (average + 180) % 360
else:
# opposite angles: return whichever has more weight
if smallCount > largeCount:
return smallAverage
elif smallCount < largeCount:
return largeAverage
else:
return None
>>> averageAngles2([0, 0, 90])
30.0
>>> averageAngles2([30, 350])
10.0
>>> averageAngles2([0, 200])
280.0
Here's a slightly naive algorithm:
remove all oposite angles from the list
take a pair of angles
rotate them to the first and second quadrant and average them
rotate average angle back by same amount
for each remaining angle, average in same way, but with successively increasing weight to the composite angle
some python code (step 1 not implemented)
def averageAngles(angles):
newAngles = [a % 360 for a in angles];
average = 0
weight = 0
for ang in newAngles:
theta = 0
if 0 < ang - average <= 180:
theta = 180 - ang
else:
theta = 180 - average
r_ang = (ang + theta) % 360
r_avg = (average + theta) % 360
average = ((r_avg * weight + r_ang) / float(weight + 1) - theta) % 360
weight += 1
return average

Here's the answer I gave to this same question:
How do you calculate the average of a set of circular data?
It gives answers inline with what the OP says he wants, but attention should be paid to this:
"I would also like to stress that even though this is a true average of angles, unlike the vector solutions, that does not necessarily mean it is the solution you should be using, the average of the corresponding unit vectors may well be the value you actually should to be using."

You are correct that the accepted answer of using traditional average is wrong.
An average of a set of points x_1 ... x_n in a metric space X is an element x in X that minimizes the sum of distances squares to each point (See Frechet mean). If you try to find this minimum using simple calculus with regular real numbers, you will recover the standard "add up and divide by n" formula.
For an angle, our elements are actually points on the unit circle S1. Our metric isn't euclidean distance, but arc length, which is proportional to angle.
So, the average angle is the one that minimizes the square of the angle difference between each other angle. In other words,
if you have a function angleBetween(a, b) you want to find the angle a
such that sum over i of angleBetween(a_i, a) is minimized.
This is an optimization problem which can be solved using a numerical optimizer. Several of the answers here claim to provide simpler closed forms, or at least better approximations.
Statistics
As you point out in your article, you need to assume errors follow a Gaussian distribution to justify using least squares as the maximum likelyhood estimator. So in this application, where is the error? Is the random error in the position of two things, and the angle is just the normal of the line between them? If so, that normal will not follow a Gaussian distribution, even if the error in point position does. Taking means of angles only really makes sense if the random error is observed in the angle itself.

You could do this: Say you have a set of angles in an array angle, then to compute the array first do: angle[i] = angle[i] mod 360, now perform a simple average over the array. So when you have 360, 10, 20, you are averaging 0, 10 and 20 - the results are intuitive.

What is wrong with taking the set of angles as real values and just computing the arithmetic average of those numbers? Then you would get the intuitive (0+0+90)/3 = 30 deg.
Edit: Thanks for useful comments and pointing out that angles may exceed 360. I believe the answer could be the normal arithmetic average reduced "modulo" 360: we sum all the values, divide by the number of angles and then subtract/add a multiple of 360 so that the result lies in the interval [0..360).

I think the problem stems from how you treat angles greater than 180 (and those greater than 360 as well). If you reduce the angles to a range of +180 to -180 before adding them to the total, you get something more reasonable:
int AverageOfAngles(int angles[], int count)
{
int total = 0;
for (int index = 0; index < count; index++)
{
int angle = angles[index] % 360;
if (angle > 180) { angle -= 360; }
total += angle;
}
return (int)((float)total/count);
}

Maybe you could represent angles as quaternions and take average of these quaternions and convert it back to angle.
I don't know If it gives you what you want because quaternions are rather rotations than angles. I also don't know if it will give you anything different from vector solution.
Quaternions in 2D simplify to complex numbers so I guess It's just vectors but maybe some interesting quaternion averaging algorithm like http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20070017872_2007014421.pdf when simplified to 2D will behave better than just vector average.

Here you go! The reference is https://www.wxforum.net/index.php?topic=8660.0
def avgWind(directions):
sinSum = 0
cosSum = 0
d2r = math.pi/180 #degree to radian
r2d = 180/math.pi
for i in range(len(directions)):
sinSum += math.sin(directions[i]*d2r)
cosSum += math.cos(directions[i]*d2r)
return ((r2d*(math.atan2(sinSum, cosSum)) + 360) % 360)
a= np.random.randint(low=0, high=360, size=6)
print(a)
avgWind(a)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio