Linear Least Squares Fit of Sphere to Points

Linear Least Squares Fit of Sphere to Points - algorithm

I'm looking for an algorithm to find the best fit between a cloud of points and a sphere.
That is, I want to minimise
where C is the centre of the sphere, r its radius, and each P a point in my set of n points. The variables are obviously Cx, Cy, Cz, and r. In my case, I can obtain a known r beforehand, leaving only the components of C as variables.
I really don't want to have to use any kind of iterative minimisation (e.g. Newton's method, Levenberg-Marquardt, etc) - I'd prefer a set of linear equations or a solution explicitly using SVD.

There are no matrix equations forthcoming. Your choice of E is badly behaved; its partial derivatives are not even continuous, let alone linear. Even with a different objective, this optimization problem seems fundamentally non-convex; with one point P and a nonzero radius r, the set of optimal solutions is the sphere about P.
You should probably reask on an exchange with more optimization knowledge.

You might find the following reference interesting but I would warn you
that you will need to have some familiarity with geometric algebra -
particularly conformal geometric algebra to understand the
mathematics. However, the algorithm is straight forward to implement with
standard linear algebra techniques and is not iterative.
One caveat, the algorithm, at least as presented fits both center and
radius, you may be able to work out a way to constrain the fit so the radius is constrained.
Total Least Squares Fitting of k-Spheres in n-D Euclidean Space Using
an (n+ 2)-D Isometric Representation. L Dorst, Journal of Mathematical Imaging and Vision, 2014 p1-21
Your can pull in a copy from
Leo Dorst's researchgate page
One last thing, I have no connection to the author.

Short description of making matrix equation could be found here.
I've seen that WildMagic Library uses iterative method (at least in version 4)

You may be interested by the best fit d-dimensional sphere, i.e. minimizing the variance of the population of the squared distances to the center; it has a simple analytical solution (matrix calculus): see the appendix of the open access paper of Cerisier et al. in J. Comput. Biol. 24(11), 1134-1137 (2017), https://doi.org/10.1089/cmb.2017.0061
It works when the data points are weighted (it works even for continuous distributions; as a by-product, when d=1, a well-known inequality is retrieved: the kurtosis is always greater than the squared skewness plus 1).

Difficult to do this without iteration.
I would proceed as follows:
find the overall midpoint, by averaging (X,Y,Z) coords for all points
with that result, find the average distance Ravg to the midpoint, decide ok or proceed..
remove points from your set with a distance too far from Ravg found in step 2
go back to step 1 (average points again, yielding a better midpoint)
Of course, this will require some conditions for (2) and (4) that depends on the quality of your points cloud !

Ian Coope has an interesting algorithm in which he linearized the problem using a change of variable. The fit is quite robust, and although it slightly redefines the condition of optimality, I've found it to be generally visually better, especially for noisy data.
A preprint of Coope's paper is available here: https://ir.canterbury.ac.nz/bitstream/handle/10092/11104/coope_report_no69_1992.pdf.
I found the algorithm to be very useful, so I implemented it in scikit-guess as skg.nsphere_fit. Let's say you have an (m, n) array p, consisting of M points of dimension N (here N=3):
r, c = skg.nsphere_fit(p)
The radius, r, is a scalar and c is be an n-vector containing the center.

Related

How to calculate the normals of a box?

I am trying to create an algorithm that calculate the normals of a model/ mesh. People have been telling me to use the cross products between the two vectors which at first seem like a good idea until I discovered that it might not always work. For instance just imagine a box with its front face sitting at the origin and its back face down the Z axis. Here is an image:
I do apologize for bad hand writing but that shouldn't be of any significance. As you can see,I cross v and u to get the normal pointing toward the positive z axis. However, If I use that same calculation to calculate the normal for the back face then obviously the normal will then be a vector directing inside the shape. The result is that I have inaccurate normals to calculate the brightness of a light. I want the normal to be facing away from the model at all time.
I know there gotta be a better way to calculate the normal but I don't know what it is. Can anyone suggests to me another algorithm to calculate the normal that would get rid of this problem? If not then there has to be a way to check whether or not a normal is facing inside the object / model. If so then can you suggests it in the answer and where I would find an explanation about it because I would love to have an intuition on how these methodologies work.

Most software packages obey a configurable cyclic ordering for triangle indices - clockwise or anti-clockwise. Thus all meshes they export have self-consistent ordering, and as long as your program uses the same convention, you should have nothing to worry about.
Having said that, I imagine you want to know what to do in the hypothetical (?) situation where the index ordering is inconsistent.
One method we could use is ray-intersection. The important theorem is that a ray with its source outside the mesh will only intersect the mesh an even number of times, and if inside, odd.
To do this, we can do the following:
Calculate the "normal" using the cross product as above (and normalize it) => N
Take any point on the triangle (preferably the midpoint)
Increment this point along the normal by some small epsilon value (depends on your floating point format and size of model - I'd say 1e-4 for single and 1e-8 for double precision) => P
Intersect this ray [dir = N, src = P] with all triangles in the mesh (a good algorithm for this would be Möller–Trumbore)
If the number of intersections is even, then the ray started from outside of the mesh; this means that the normal points outwards from the mesh (because you incremented its source from a point on the surface). - and of course, vice versa.
Minor (-ish ?) digression: a naive approach to the above, of looping through all triangles in the mesh, would be O(n) - and hence the whole procedure would have quadratic time complexity. This is perfectly fine for very small meshes of ~20 triangles (e.g. a box), but not ideal for any larger!
You can use spatial sub-division techniques to lower the cost of this intersection step:
K-D trees / Octrees: These require O(n log n) (for the best algorithm, that is - see Ingo Wald's paper) to construct, but intersections are guaranteed to be O(log n) if done properly. The overall complexity would then be O(n log n), which is pretty much the best you can get
Grid: This simply partitions the search space and triangles into smaller boxes. Construction is O(n) and much more memory-efficient. Intersection time is still O(n), but the constant factor is much smaller than that of the naive approach.

Cross products are not commutative so v x u is not the same as u x v. In fact, they will be the exact opposite.
For the front face, you want to take u x v (assuming you're in a right-hand coordinate system), and the back face you want to cross v x u.
See right-hand rule for more info on how crossing vectors works.

Linear programming algorithm

Consider the following algorithm for linear programming, minimizing [c,x] with A.x <= b.
(1) Start with a feasible point x_0
(2) Given a feasible point x_k, find the greatest alpha such that x_k - alpha.c is admissible (straighforward, look at the ratios of the components of A.x0 to A.c)
(3) take the normal unit vector n to the hyperplane we just reached, pointing inwards. Project n on the plane [c,.] giving r = n - [n,c]/[c,c].c, then look for the greatest beta for which x_k - alpha.c + beta.r is admissible. Set x_{k+1} = x_k - alpha.c + 1/2*beta.r
If x_{k+1} is close enough to x_k within tolerance, return it, otherwise go to (2)
The basic idea is to follow the gradient until one hits a wall. Then, rather than following the shell of the simplex, like the simplex algorithm would do, the solution is kicked back inside the simplex, on a plane where the solutions are no worse, in the direction of the normal vector. The solution moves halfway between the starting point and the next constraint in this direction. It's no worse than before, but now it's more "inside" the simplex, where is has a shot at taking long leaps towards the optimum.
Though the probability of hitting an intersection of more than one hyperplane is 0, if one gets close enough to multiple hyperplane within a certain tolerance, the average of the normals may be taken.
This can be generalized to any convex objective function by following geodesics on the levels of the function. For quadratic programming in particular, one rotates the solution towards the inside of the simplex.
Questions:
Does this algorithm have a name or fall within a category of linear-programming algorithms?
Does it have an obvious flaw that I'm overlooking?

I am pretty sure this doesn't work, unless I miss something: your algorithm will not start moving in most cases.
Assume your variable x is taken in R^n.
A polyhedron of the form Ax <= b is contained in a 'maximal' affine subspace P of dimension p <= n, and usually p is much smaller than n (you will have a lot of equality constraints, which can be implicit or explicit: you cannot assume that the expression of P is simple to obtain from A and b).
Now assume you can find an initial point x_0 (which is far from being obvious, btw) ; there is very little chance that the direction of the gradient c is a feasible direction. You would need to consider the projection of the direction c on P, and this is very difficult to do in practice (how would you compute such projection?).
Then, what you want in your step (3) is not the normal direction of the hyperplane you reached, but again its projection on P (visualize the polyedron as a 2d polyedron embedded in a 3d space can help).
There is a very good reason why barrier functions are used in the interior point methods: it is very difficult to describe in practice the geometry of the high-dimension convex sets from the constraints (even simple ones like polyedrons), and things that "seems obvious" when you draw a picture on a piece of paper will not usually work when the dimension of the polyedron increases.
One last point is that your algorithm would not give the exact solution, whereas the simplex does in theory (and I read somewhere it can be done in practice by working with exact rational numbers).

read up on interior point methods: http://en.wikipedia.org/wiki/Interior_point_method
this approach can have nice theoretical properties, but the algorithm performance can tend to tail off in practice

Nearest neighbor zones visualized

I'm writing an app that looks up points in two-dimensional space using a k-d tree. It would be nice, during development, to be able to "see" the nearest-neighbor zones surrounding each point.
In the attached image, the red points are points in the k-d tree, and the blue lines surrounding each point bound the zone where a nearest neighbor search will return the contained point.
The image was created thusly:
for each point in the space:
da = distance to nearest neighbor
db = distance to second-nearest neighbor
if absolute_value(da - db) < 4:
draw blue pixel
This algorithm has two problems:
(more important) It's slow on my (reasonably fast Core i7) computer.
(less important) It's sloppy, as you can see by the varying widths of the blue lines.
What is this "visualization" of a set of points called?
What are some good algorithms to create such a visualization?

This is called a Voronoi Diagram and there are many excellent algorithms for generating them efficiently. The one I've heard about most is Fortune's algorithm, which runs in time O(n log n), though others algorithms exist for this problem.
Hope this helps!

Jacob,
hey, you found an interesting way of generating this Voronoi diagram, even though it is not so efficient.
The less important issue first: the varying thickness boundaries that you get, those butterfly shapes, are in fact the area between the two branches of an hyperbola. Precisely the hyperbola given by the equation |da - db| = 4. To get a thick line instead, you have to modify this criterion and replace it by the distance to the bisector of the two nearest neighbors, let A and B; using vector calculus, | PA.AB/||AB|| - ||AB||/2 | < 4.
The more important issue: there are two well known efficient solutions to the construction of the Voronoi diagram of a set of points: Fortune's sweep algorithm (as mentioned by templatetypedef) and Preparata & Shamos' Divide & Conquer solutions. Both run in optimal time O(N.Lg(N)) for N points, but aren't so easy to implement.
These algorithm will construct the Voronoi diagram as a set of line segments and half-lines. Check http://en.wikipedia.org/wiki/Voronoi_diagram.
This paper "Primitives for the manipulation of general subdivisions and the computation of Voronoi" describes both algorithms using a somewhat high-level framework, caring about all implementation details; the article is difficult but the algorithms are implementable.
You may also have a look at "A straightforward iterative algorithm for the planar Voronoi diagram", which I never tried.
A totally different approach is to directly build the distance map from the given points for example by means of Dijkstra's algorithm: starting from the given points, you grow the boundary of the area within a given distance from every point and you stop growing when two boundaries meet. [More explanations required.] See http://1.bp.blogspot.com/-O6rXggLa9fE/TnAwz4f9hXI/AAAAAAAAAPk/0vrqEKRPVIw/s1600/distmap-20-seed4-fin.jpg
Another good starting point (for efficiently computing the distance map) can be "A general algorithm for computing distance transforms in linear time".

From personal experience: Fortune's algorithm is a pain to implement. The divide and conquer algorithm presented by Guibas and Stolfi isn't too bad; they give detailed pseudocode that's easy to transcribe into a procedural programming language. Both will blow up if you have nearly degenerate inputs and use floating point, but since the primitives are quadratic, if you can represent coordinates as 32-bit integers, then you can use 64 bits to carry out the determinant computations.
Once you get it working, you might consider replacing your kd-tree algorithms, which have a Theta(√n) worst case, with algorithms that work on planar subdivisions.

You can find a great implementation for it at D3.js library: http://mbostock.github.com/d3/ex/voronoi.html

Approximate Estimation of Distance Matrices

I have a set of N objects, and I'd like to compute a NxN distance matrix. Sometimes my set of N objects is very large, and I'd like to compute an approximation to the NxN distance matrix by only computing a subset of the distance comparisons.
Can anyone point me in the direction of something that calculates approximations to a full distance matrix? I have some ideas in mind, but I'd like to avoid re-inventing the wheel.
Edit: An example of the type of algorithm would take advantage of the fact that if there is a very small distance between object A and object B, and there is a very small distance between object B and object C, there has to be a somewhat short distance between objects A and C.

I had this same question and ended up writing Python code for it:
https://github.com/jpeterbaker/lazyDistance
README.md explains how the triangle inequality can be used to update upper and lower bounds for each distance.
Just run the Python file as a script for an example in 2-dimensional space. The plotted lines are the only distances that were actually calculated.
In my version, the time savings aren't about having a large number of objects. As I've written it, it's a O(n^4) algorithm, so it's actually worse than just calculating all distances if the number of objects is large. But my method will save time when you have a modest number of objects and the distance function is very expensive to calculate. It assumes that it is faster to do several O(n^2) operations rather than a single distance measurement.
If n is large, you could look for cheaper methods to decide which distance to calculate next (that don't involve arithmetic with n^2 entries of distance bounds matrices). You also may not need to update all 2*n^2 bounds every time that this code does.

Honestly, I think it depends how close you want your approximation to be and how big your subset is. If you just want some overall feel of what the matrix will look like, you can do simple linear interpolation on a random subset (including the maximal and minimal nodes) getting pretty accurate (tm) results.
I think the real trick here is figuring out the heuristic (linear, quadratic, etc interpolation) and the subset size. You could also figure out the distance matrices of various subsets and then interpolate those matrices with some method (linear, spherical linear, cubic).
Depending on your initial sample, it's pretty much an heuristic trial and error until you go "oh that's good enough for what I need".

Are your "objects" on a network? If the objects are in a network, you can use this or this that yields the all-pairs shortest paths. If not, you're pretty much stuck with calculated all the n x n distances, I think.

The solution you require is similar to what we commonly see in a graph, you can use All pair shortest path for finding the distance, you can also look at johnson's algorithm

Is there an efficient algorithm to generate a 2D concave hull?

Having a set of (2D) points from a GIS file (a city map), I need to generate the polygon that defines the 'contour' for that map (its boundary). Its input parameters would be the points set and a 'maximum edge length'. It would then output the corresponding (probably non-convex) polygon.
The best solution I found so far was to generate the Delaunay triangles and then remove the external edges that are longer than the maximum edge length. After all the external edges are shorter than that, I simply remove the internal edges and get the polygon I want. The problem is, this is very time-consuming and I'm wondering if there's a better way.

One of the former students in our lab used some applicable techniques for his PhD thesis. I believe one of them is called "alpha shapes" and is referenced in the following paper:
http://www.cis.rit.edu/people/faculty/kerekes/pdfs/AIPR_2007_Gurram.pdf
That paper gives some further references you can follow.

This paper discusses the Efficient generation of simple polygons for characterizing the shape of a set of points in the plane and provides the algorithm. There's also a Java applet utilizing the same algorithm here.

The guys here claim to have developed a k nearest neighbors approach to determining the concave hull of a set of points which behaves "almost linearly on the number of points". Sadly their paper seems to be very well guarded and you'll have to ask them for it.
Here's a good set of references that includes the above and might lead you to find a better approach.

The answer may still be interesting for somebody else: One may apply a variation of the marching square algorithm, applied (1) within the concave hull, and (2) then on (e.g. 3) different scales that my depend on the average density of points. The scales need to be int multiples of each other, such you build a grid you can use for efficient sampling. This allows to quickly find empty samples=squares, samples that are completely within a "cluster/cloud" of points, and those, which are in between. The latter category then can be used to determine easily the poly-line that represents a part of the concave hull.
Everything is linear in this approach, no triangulation is needed, it does not use alpha shapes and it is different from the commercial/patented offering as described here ( http://www.concavehull.com/ )

A quick approximate solution (also useful for convex hulls) is to find the north and south bounds for each small element east-west.
Based on how much detail you want, create a fixed sized array of upper/lower bounds.
For each point calculate which E-W column it is in and then update the upper/lower bounds for that column. After you processed all the points you can interpolate the upper/lower points for those columns that missed.
It's also worth doing a quick check beforehand for very long thin shapes and deciding wether to bin NS or Ew.

A simple solution is to walk around the edge of the polygon. Given a current edge om the boundary connecting points P0 and P1, the next point on the boundary P2 will be the point with the smallest possible A, where
H01 = bearing from P0 to P1
H12 = bearing from P1 to P2
A = fmod( H12-H01+360, 360 )
|P2-P1| <= MaxEdgeLength
Then you set
P0 <- P1
P1 <- P2
and repeat until you get back where you started.
This is still O(N^2) so you'll want to sort your pointlist a little. You can limit the set of points you need to consider at each iteration if you sort points on, say, their bearing from the city's centroid.

Good question! I haven't tried this out at all, but my first shot would be this iterative method:
Create a set N ("not contained"), and add all points in your set to N.
Pick 3 points from N at random to form an initial polygon P. Remove them from N.
Use some point-in-polygon algorithm and look at points in N. For each point in N, if it is now contained by P, remove it from N. As soon as you find a point in N that is still not contained in P, continue to step 4. If N becomes empty, you're done.
Call the point you found A. Find the line in P closest to A, and add A in the middle of it.
Go back to step 3
I think it would work as long as it performs well enough — a good heuristic for your initial 3 points might help.
Good luck!

You can do it in QGIS with this plug in;
https://github.com/detlevn/QGIS-ConcaveHull-Plugin
Depending on how you need it to interact with your data, probably worth checking out how it was done here.

As a wildly adopted reference, PostGIS starts with a convexhull and then caves it in, you can see it here.
https://github.com/postgis/postgis/blob/380583da73227ca1a52da0e0b3413b92ae69af9d/postgis/postgis.sql.in#L5819

The Bing Maps V8 interactive SDK has a concave hull option within the advanced shape operations.
https://www.bing.com/mapspreview/sdkrelease/mapcontrol/isdk/advancedshapeoperations?toWww=1&redig=D53FACBB1A00423195C53D841EA0D14E#JS
Within ArcGIS 10.5.1, the 3D Analyst extension has a Minimum Bounding Volume tool with the geometry types of concave hull, sphere, envelope, or convex hull. It can be used at any license level.
There is a concave hull algorithm here: https://github.com/mapbox/concaveman

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio