Algorithm: Find 2d orientation from constellation of known points? - algorithm

Given a set of known cartesian points (set A), and a 2d transformation (rotation, translation, scale) of some subset of those points (set B), find the orientation of the subset (rotation, translation, scale) relative to the original set of points.
I.E. Suppose I take a "picture" of a known set of 2d points on a wall. I want to know what position the camera was in relative to "upright and centered" when the picture was taken. Some of the points may not be visible in the picture (they may be occluded). (in this analogy, assume the camera is orthoganal and always pointed directly at the plane of the wall, so you don't need to take distortion or perspective into account)
Proposed approach:
Step 1: Scale B to the same "range" as A
Don't know how; open to suggestions. Maybe take the area of a convex hull around all the points in B, and scale it to nearly that of the convex hull around A. This is tricky, because points may be missing from B.
Step 2: Match some arbitrary point in "B" to its twin in "A"
Pick some random point in set B. Call this point K. Somehow take a "fingerprint" of K relative to all the other points in B (using distance only). Find its match in A by fingerprinting all points in A and taking the point with the most similar fingerprint of K.
Step 3: Rotate B (around K) until all points in B are aligned with a point in A
Multiple solutions are possible, so keep rotating though 360d looking for solutions.
That's just shooting from the hip, I may be way off base. Anyone have any ideas?

Assuming you don't actually know the correspondence between the points in the two clouds, you could try a statistical approach.
First, compute the mean x0 of the original cloud, then compute the mean x1 of the subset cloud. The difference of the mean vectors, x1-x0, is a good estimate of the required translation.
Now, subtract the relevant mean vector from each set to give two clouds centered at the origin. Compute the covariance matrix for each cloud and find its eigenvalues and eigenvectors. The required rotation can be found from the eigenvectors, while the scaling corresponds to the eigenvalues.
Compose all of this and you should have a good statistical estimate of the desired transform. Obviously, its quality will be a function of how well the subset spans the original set.

"Give me a place to stand on, and I will move the Earth" Archimede
I think we should follow the steps of Archimede
Arpi's algoritm:
We must choose a point (X1) of set A with coordinates (0, 0). (this will be the place to stand on)
Choose another point (X2) and put it on the OX vector (to simplify things)
All the other points' coordinates from set A will be calculated based on the coordinates of X1(0, 0) and X2(some_Coordinate, 0).
Now, choose a point from set B (Y1) and that will be the center of the B set. Choose another point from set B (Y2) and put it to OX of the B set. Now, we have a scale scalar and a rotation angle. If this will be a solution, than Y1 in the B set represents X1 from the A set and Y2 from the B set represents X2 from the A set. If we can find a map between the B set and A set based on this, using all the points of the B set and Yi <> Yj if i <> j, where i and j are the indexes of the points in our representation than we have a potential solution and we store that.
End of Arpi's algoritm
To find all the potential solutions you must do the following:
foreach point in A as X1 do
foreach point in A as X2 do
arpi's algoritm(X1, X2)
Of course, you can optimize this, but for the sake of simplicity I described it without optimizations (complications), it will be your job to optimize this and only if you need that.

I would attempt to minimize the deviation between the target points and the found points. Meaning I would pair each target point with a found point, and apply any transformation (rotation, scale or skew) to all the target points which decreases the sum of the deviations. I would repeat this for all potential pairs, eventually taking the match to be the set of pairs and the necessary transformations with the smallest total deviation.
The real question is how you optimize this so the performance to be better than O(n^2). I suppose some sort of heuristic matching, perhaps caching the intermediary results, or finding a method of eliminating some pairs earlier in the process.


Algorithm: find minimum space spanning points defined only by their separations

I have a collection of points in some N-dimensional space, where all I know is the distances between them. Let's say it's an unordered collection of structs like the following:
struct {
int first; // Just some identifier that uniquely specifies a point
int second; // No importance to which point is first or second
float separation; // The distance between the first and second points -- always positive
Of course the algorithm doesn't have to be C code. I just wrote the struct in this style to make the problem clear. It rather upsets me that the struct spoils the symmetry between the two end-points, but fixing this just makes things more complicated.
Let's say that the separations are defined by the Pythagorean distance between them, and the space is Euclidean. Let's also specify that the separations are internally consistent. For example, given separations AB, BC and AC, we know that AB + BC >= AC.
I want an algorithm that finds the minimal dimensional space that can contain all the points. Within this algorithm, we can assume that separations that deviate from that defined by the space by less than some specified tolerance can be ignored.
Does anyone know an algorithm that does this? So far, I've only been able to think up non-polynominal algorithms. Can anybody improve on that, or at least make something that is clean and extensible?
Why is this interesting? In Physics there are some low-level theories such as String Theory or Quantum Loop Gravity that do not obviously predict our three dimensional world. This algorithm could be part of a project to find how a 3d world can be emergent.
Thank you everybody who posted ideas here. I now have an answer to my own question. It's not great, in that it executes O(n^3) but at least it's polynomial. Roughly, it works like this:
Represent the problem as a symmetric matrix with zero diagonal -- representing the distances between any two points. This is equivalent to the representation using structs, but much easier to work with.
Assume the ordering of the points implied by the matrix (first column/row = first point) is sensible. (It may be worth pivoting to find a better ordering, but that is todo.)
Now create a rectangular coordinate system to fit the points, starting with the first point, which WLOG we take to be the origin.
Second point defines the x axis
For each subsequent point, we calculate its coordinates one at a time, starting with the x axis. We know the distance from the origin and the distance from point 2. This allows us to calculate the x coordinate, as we end up with two simultaneous equations x^2 + y^2 + ... = s1^2 and (x - x2)^2 + y^2 + ... = s2^2, which allows us to calculate x easily from x2, the x coordinate of point 2, and the distances from points 1 and 2, s1 and s2.
Each new coordinate can be calculated easily, because the matrix of coordinates calculated so far is triangular -- there is only one unknown each time.
The last coordinate for each point is on a new axis -- a dimension that has not yet been used. Calculate its coordinate using Pythagoras on the distance from the origin, as we know all the other coordinates.
It is possible that the coordinate on the new axis will come out imaginary -- a general set of distances cannot always be represented by a coordinate system of any number of dimensions -- at least not with real numbers. If this is the case, I error.
Keep going in this way for each new point, building up a vector of coordinate vectors for each point. In general, this is triangular, but there may be cases where the final coordinate we calculate is near enough to zero that we consider the point's position to be represented by the existing dimensions. I store the coordinates anyway, but keep the number of dimensions the same as the previous point. I also skip these points, as they are not needed for calculating further points (see step 10).
Finally, we have represented all points such that the distances are consistent.
As a final check, I validate that the distances match for all points, including those skipped in step 9.
The number of dimensions needed is the number used for the last point.
If anyone is interested in an implementation of this (in Haskell), it is on my GitHub page at

algorithm to select a pair of vectors for the best "zigzag" profile

I have a set of distinct 2D vectors (over real numbers), pointing in different directions. We are allowed to pick a pair of vectors and construct their linear combination, such that the coefficients are positive and their sum is 1.
In simple words we are allowed to take a "weighted average" of any two vectors.
My goal is for an arbitrary direction to pick a pair of vectors whose "weighted average" is in this direction and is maximized.
Speaking algebraically given vectors a and b and a direction vector n we are interested in maximizing this value:
[ a cross b ] / [ (a - b) cross n ]
i.e. pick a and b which maximize this value.
To be concrete the application of this problem is for sailing boats. For every apparent wind direction the boat will have a velocity given by a polar diagram. Here's an example of such a diagram:
(Each line in this diagram corresponds to a specific wind magnitude). Note the "impossible" front sector of about 30 degrees in each direction.
So that in some direction the velocity will be high, for some - low, and for some directions it's impossible to sail directly (for instance in the direction strictly opposite to the wind).
If we need to advance in a direction in which we can't sail directly (or the velocity isn't optimal) - it's possible to advance in zigzags. This is called tacking.
Now, my goal is to recalculate a new diagram which denotes the average advance velocity in any direction, either directly or indirectly. For instance for the above diagram the corrected diagram would be this:
Note that there are no more "impossible" directions. For some directions the diagram resembles the original one, where it's best to advance directly, and no maneuver is required. For others - it shows the maximum average advance velocity in this direction assuming the most optimal maneuver is periodically performed.
What would be the most optimal algorithm to calculate this? Assume the diagram is given as a discrete set of azimuth-velocity pairs, from which we can calculate the vectors.
So far I just check all the vector pairs to select the best. Well, there're cut-off criterias, such as picking only vectors with positive projection on the advance direction, and opposite perpendicular projections, but still the complexity is O(N^2).
I wonder if there's a more efficient algorithm.
Many thanks to #mcdowella. For both computer-science and sailor answers!
I too thought in terms of convex polygon, figured out that it's only worth probing vectors on that hull (i.e. if you take a superposition of 2 vectors on this hull, and try to replace one of them by a vector which isn't on this hull, the result would be worse since new vector's projection on the needed direction is worse than of both source vectors).
However I didn't realize that any "weighted average" of 2 vectors is actually a straight line segment connecting those vectors, hence the final diagram is indeed this convex hull! And, as we can see, this is also in agreement with what I calculated by "brute-force" algorithm.
Now the computer science answer
A tacking strategy gives you the convex combination of the vectors from the legs that make up the tacks.
So consider the outline made by just one contour in your diagram. The set of all possible best speeds and directions is the convex polygon formed by taking all convex combinations of the vectors to the contour. So what you want to do is form the convex hull of your contour ( To find out how to go fast in any particular direction, intersect that vector with the convex hull, and use tacks with legs that correspond to the corners on either side of the edge of the convex hull that you intersect with.
Looking at your diagram, the contour is concave straight upwind and straight downwind, which is what you would expect. However there is also another concave section, somewhere between 4 and 5 O'Clock and also symmetrically between 7 and 8 O'Clock, which appears as a straight line in your corrected diagram - so I guess there is a third direction to tack in, using two reaches on the same side of the wind which I don't recognise from traditional sailing.
First the ex-laser sailor answer
At least for going straight upwind or downwind, the obvious guess is to tack so that each leg is of the same length and of the same bearing to the wind. If the polar diagram is symmetric around the upwind-downwind axis this is correct. Suppose upwind is the Y axis and possible legs are (A, B), (-A, B), (a, b) and (-a, b). Symmetrical tacking moves (A, B)/2 + (-A, B)/2 = (0, B) and the other symmetrical tack gives you (0, b). Asymmetrical tacking is (-A, B)a/(a+A) + (a, b)A/(a+A) = (0, (a/(a+A))B + (A/(a+A))b) and if b!=B lies between b and B and so is not as good as whichever of b or B is best.
For any direction which lies between the port and starboard tacks that you would take to work your way upwind, the obvious strategy is to change the length of those legs but not their direction so that the average vector traveled is in the required direction. Is this the best strategy? If not, the better strategy is making progress upwind faster that the port and starboard tacks that you would take to work your way upwind, which I think is a contradiction - so for any direction which lies between the port and starboard tacks made to go upwind I think the best strategy is indeed to make those tacks but alter the leg lengths to go in the required direction. The same thing should apply for tacking downwind, if you have a boat that makes that a good idea.

Generating a minimal set of vertices from a spline/curve

In my project, I represent geometry using splines. For physics and rendering I preprocess the splines and convert them into lines, and later polygons, by sampling the splines at a regular interval. However, I want to reduce the number of vertices/lines by ignoring samples that are already well enough represented by a line.
Coming up short when searching, I was wondering if there are any traditional techniques to convert a curve to a set of vertices while reducing the resulting error.
EDIT: To clarify, the result I want to end up with is a number of vertices/line segments that best represent the spline with the fewest amount of vertices/line segments. I'm not sure how to define what "best represent the spline" really means, but the goal is to make it as hard as possible to distinguish the difference between the spline and the approximation.
It can be done by recursively refining part which is not near segment between part ends.
If we have curve (spline) C:[0,1]->R^n. Than first approximation is segment S between curve end points [C(0), C(1)]. Take point C(0.5) and check how far is it from segment S. If it is far than we have to take it in discretization, if not than S is good approximation. If C(0.5) is far, than next approximation is polyline [C(0), C(0.5), C(1)], and we make same procedure with parts [C(0), C(0.5)] and [C(0.5), C(1)].
If you are using polynomial spline of order >= 3 (e.g. cubic spline) than it can have inflection point(s). In that case it is possible that curve point on half can 'fall' right on segment, but curve around to be far from segment. In that case it is good to check one more level of sub-parts.
This is entirely based on my own intuition, so I'm not sure if it coincides AT ALL with best practices. I do have a mathematics degree, so hopefully it's not too far off. I'll have you note that the computation involved may outstrip performance gains granted by not using as many vertices if the spline needs to be recalculated frequently.
Let's say the vertices are in an array like [v(0), v(1), v(2),..., v(n)] where each v(i) is something like (x, y). By iterating over the vertices starting at v(1) and ending at v(n-1), we can compare a point with its neighbors in order to tell whether or not to discard it. Note that we ignore v(0) and v(n) for two reasons: (I assume) we don't want to remove our endpoints, and also v(0) and v(n) are missing a neighbor that we would need in order to set up our calculation. I can think of a couple possibilities here that might warrant examination, but one in particular seems (in my head) to be the best answer...
Consider the case where we're deciding whether or not to remove v(i) from the vertex array. We could examine the Cartesian distance between v(i) and its neighbors, and remove the point if both are below some threshold value T. For example if v(i-1) = (x1, y1) and v(i) = (x2, y2) and v(i+1) = (x3, y3), then we evaluate sqrt((x2-x1)^2 + (y2-y1)^2))<T && sqrt((x3-x2)^2 + (y3-y2)^2))<T, removing v(i) if the evaluation returns true.
In 3+ dimensions, this would become more complicated - the calculation would be similar, but you would require a method of determining a point's neighbors since they might not lie directly next to the examined point in the vertex array.

How to convert relative gravitational force to coordinates

I have a question.
I have N objects and N x N matrix M. Each entry M(i, j) contains (a kind of) relative gravitational force indicating how strongly i pulls j toward it (or inversely pull it away from it).
I want to place these N objects on a two-dimensional R x R plane by assigning a coordinate to each object.
Is there an algorithm/method that does this? There must be some commonly methods used in astrophysics, physics, chemistry, etc.
Thank you for your help.
You are interested in assigning co-ordinates (xi,yi,zi) and mass (mi) to each object such that gravitational force is consistent, right?
Consider 8 points at a time. You have a total of 32 unknowns and 28 equations. You can assume that first point is at origin and second point on x axis. That means, you will have 28 unknown and 28 equations.
So, first device and algorithm to solve for 8 points at a time. Then incrementally add one point at each iteration.
Consider you are given n points in D dimensions. You only have distances between the points, but not the co-ordinates. Goal is to find co-ordinates for each point.
If D=1, you need to consider only two (+1) points at a time. Place first point at origin. Place second point on the positive side of origin. You can place third point in relation to origin, but place it right or left of it depending on the distance to first point and so on...
If D=2, place point 1 at origin, point to on positive side on x axis, third point on positive side of y axis depending on distance. From fourth point onward, you can use any two placed points to place the next and use any other point to refine the options (there will be two options).
Similar with D=3. Place first three points on xy plane (z=0) for all three. Next, place 4th point ofn postive part of z axist. And so on.
Coming back to gravity:
Your problem is complicated because you cannot exactly place mass at the origin. So you would need more than 5 points to place them. As I have shown above, you need at most 8 points though.
In case your mass are all equal, you can calculate distance (~inverse of gravity) and apply the case when D=3.
The problem is, given that we know the n*n distances between n objects, how to obtain their positions?
1. Put the first one, say a, at (0,0)
2. Put the second one b at ( |b-a|, 0 )
3. For the third one c, it is at the one of the two intersections of the two circles:
|p-a|=|c-a| and |p-b|=|c-b|.
Solve this system of quadratic equations using the well-known formula, choose
either of the solutions as the position of c.
4. For any other points p, do the same thing as we're done for c, but choose one of the
two solutions that is consistent with the distance |p-c|. And check the distance
between p and all previous points. If the check fails, return with failure.

How to perform spatial partitioning in n-dimensions?

I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc).
I've been reading up on the algorithms, and I understand most of it:
here and here
I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane.
[edit to add more info]
This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely.
My questions are:
What is an algorithm to find the plane once I have the centroid?
How can I test a vector to see if it is on either side of that plane?
If you start with one centroid, then you'll have to split it, basically by doubling it and slightly moving the points apart in an arbitrary direction. The plane is just the plane orthogonal to that direction.
But you don't need to compute that plane.
More generally, the region (i) is defined as the set of points which are closer to the centroid c_i than to any other centroid. When you have two centroids, each region is a half space, thus separated by a (hyper)plane.
How to test on a vector x to see on which side of the plane it is? (that's with two centroids)
Just compute the distance ||x-c1|| and ||x-c2||, the index of the minimum value (1 or 2) will give you which region the point x belongs to.
More generally, if you have n centroids, you would compute all the distances ||x-c_i||, and the centroid x is closest to (i.e., for which the distance is minimal) will give you the region x is belonging to.
I don't quite understand the algorithm, but the second question is easy:
Let's call V a vector which extends from any point on the plane to the point-in-question. Then the point-in-question lies on the same side of the (hyper)plane as the normal N iff V·N > 0
