This question is not directly related to a particular programming language but is an algorithmic question.
What I have is a lot of samples of a 2D function. The samples are at random locations, they are not uniformly distributed over the domain, the sample values contain noise and each sample has a confidence-weight assigned to it.
What I'm looking for is an algorithm to reconstruct the original 2D function based on the samples, so a function y' = G(x0, x1) that approximates the original well and interpolates areas where samples are sparse smoothly.
It goes into the direction of what scipy.interpolate.griddata is doing, but with the added difficulty that:
the sample values contain noise - meaning that samples should not just be interpolated, but nearby samples also averaged in some way to average out the sampling noise.
the samples are weighted, so, samples with higher weight should contrbute more strongly to the reconstruction that those with lower weight.
scipy.interpolate.griddata seems to do a Delaunay triangulation and then use the barycentric cordinates of the triangles to interpolate values. This doesn't seem to be compatible with my requirement of weighting samples and averaging noise though.
Can someone point me in the right direction on how to solve this?
Based on the comments, the function is defined on a sphere. That simplifies life because your region is both well-studied and nicely bounded!
First, decide how many Spherical Harmonic functions you will use in your approximation. The fewer you use, the more you smooth out noise. The more you use, the more accurate it will be. But if you use any of a particular degree, you should use all of them.
And now you just impose the condition that the sum of the squares of the weighted errors should be minimized. That will lead to a system of linear equations, which you then solve to get the coefficients of each harmonic function.
Related
Say we are making a program to render the plot of a function (black box) provided by the user as a sequence of line segments. We want to get the minimum number of samples of the function so the resulting image "looks" like the function (the exact meaning of "looks" here is part of the question). A naive approach might be to just sample at fixed intervals but we can probably do better than that eg by sampling the "curvy bits" more than the "linear bits". Are there systematic approaches/research on this problem?
This reference can be helpful which is using the combined sampling method. Before that its related works explain more about other methods of sampling:
There are several strategies for plotting the function y = f(x) on interval Ω = [a, b]. The
naive approach based on sampling of f in a fixed amount of the equally spaced points is
described in [20]. The simple functions suffer from oversampling, while the oscillating curves
are under-sampled; these issues are mentioned in [14]. Another approach based on the interval
constraint plot constructing a hull of the curve was described in [6], [13], [20]. The automated
detection of a useful domain and a range of the function is mentioned in [41]; the generalized
interval arithmetic approach is described in [40].
A significant refinement is represented by adaptive sampling providing a higher sampling
density in the higher-curvature regions. The are several algorithms for the curve interpolation preserving the speed, for example: [37], [42], [43]. The adaptive feed rate technique
is described in [44]. An early implementation in the Mathematica software is presented in
[39]. By reducing data, these methods are very efficient for the curve plotting. The polygonal approximation of the parametric curve based on adaptive sampling is mentioned in the
several papers. The refinement criteria, as well as the recursive approach, are discussed in
[15]. An approximation by the polygonal curves is described in [7], the robust method for
the geometric and spatial approximation of the implicit curves can be found in [27], [10], the
affine arithmetic working in the triangulated models in [32]. However, the map projections
are never defined by the implicit equations. Similar approaches can be used for graph drawing
[21].
Other techniques based on the approximation by the breakpoints can be found in many
papers: [33], [9], [3]; these approaches are used for the polygonal approximation of the closed
curves and applied in computer vision.
Hence, these are the reference methods that define some measures for a "good" plot and introduce an approach to optimize the plot base on the measure:
constructing a hull of the curve
automated detection of a useful domain and a range of the function
adaptive sampling: providing a higher sampling density in the higher-curvature regions
providing a higher sampling density in the higher-curvature regions
approximation by the polygonal curves
affine arithmetic working in the triangulated models
combined sampling: providing the polygonal approximation of the parametric curve involving the discontinuities will be presented. The modified method will be used for the function f(x) reconstruction and plot. Based on the ideas of splitting the domain into the subintervals without the discontinuities, it represents a typical problem solvable by the recursive approach.
Let's say I want to estimate the camera pose for a given image I and I have a set of measurements (e.g. 2D points ui and their associated 3D coordinates Pi) for which I want to minimize the error (e.g. the sum of squared reprojection errors).
My question is: How do I compute the uncertainty on my final pose estimate ?
To make my question more concrete, consider an image I from which I extracted 2D points ui and matched them with 3D points Pi. Denoting Tw the camera pose for this image, which I will be estimating, and piT the transformation mapping the 3D points to their projected 2D points. Here is a little drawing to clarify things:
My objective statement is as follows:
There exist several techniques to solve the corresponding non-linear least squares problem, consider I use the following (approximate pseudo-code for the Gauss-Newton algorithm):
I read in several places that JrT.Jr could be considered an estimate of the covariance matrix for the pose estimate. Here is a list of more accurate questions:
Can anyone explain why this is the case and/or know of a scientific document explaining this in details ?
Should I be using the value of Jr on the last iteration or should the successive JrT.Jr be somehow combined ?
Some people say that this actually is an optimistic estimate of the uncertainty, so what would be a better way to estimate the uncertainty ?
Thanks a lot, any insight on this will be appreciated.
The full mathematical argument is rather involved, but in a nutshell it goes like this:
The outer product (Jt * J) of the Jacobian matrix of the reprojection error at the optimum times itself is an approximation of the Hessian matrix of least squares error. The approximation ignores terms of order three and higher in the Taylor expansion of the error function at the optimum. See here (pag 800-801) for proof.
The inverse of the Hessian matrix is an approximation of the covariance matrix of the reprojection errors in a neighborhood of the optimal values of the parameters, under a local linear approximation of parameters-to-errors transformation (pag 814 above ref).
I do not know where the "optimistic" comment comes from. The main assumption underlying the approximation is that the behavior of the cost function (the reproj. error) in a small neighborhood of the optimum is approximately quadratic.
How to reliably find out whether two Bezier curves intersect? By "reliably" I mean the test will answer "yes" only when the curves intersect, and "no" only when they don't intersect. I don't need to know what parameters the intersection was found at. I also would like to use floating-point numbers in the implementation.
I found several answers here which use the curves' bounding-boxes for the test: this is not what I'm after as such test may report intersection even if the curves don't intersect.
The closest thing I found so far is the "bounding wedge" by Sederberg and Meyers but it "only" distinguishes between at-most-one and two-or-more intersection, whereas I want to know if there is at-most-zero and one-or-more intersections.
I am assuming cubic bezier curves.
The most reliable method for reporting intersections, using floating point computation, is probably to find them, combined with error analysis.
The main problem, when floating point computations are involved, is inconsistency in computed results w.r.t. topology. Unfortunately this is unavoidable, if you need to compute anything in computational geometry within a reasonable amount of time.
So instead of stressing on the right algorithm for intersection calculation, picking a simple one and implementing error analysis is probably the solution.
I would try to implement an efficient subdivision algorithm like bezier-clipping (or a variant of quadratic clipping –Nicholas North's Geo-clip), and with running error analysis to compute tight error bounds so that we don't "miss" intersections.
To elaborate, The main sources of floating-point (double prec.) error in these subdivision based algorithms are:
Truncation error: especially the error in the input coefficients etc. which are also finite —we can't do much here within the algorithm.
Roundoff error during De Casteljau subdivision and point evaluation.
I have used the running error bounds for De Casteljau's algorithm —explained here, along with Geo-clip algorithm. It is fast and robust. (B.t.w. This theses, in general, is a good read if you want to make polynomial/bezier algorithms more robust)
Assuming, you know the basics of the bezier clipping algorithm, the general idea is to expand the hybrid bezier curve (in the first paper linked) and the fat line appropriately with the error bounds for each clip.
Some other unrelated ideas:
You can try a variant of Bentley-Ottmann sweepline algorithm. First you have to split the bezier curves as X monotone segments; and look at their Y orderings as you sweep across them. This method has a few disadvantages, since bezier curves are also capable of intersecting with multiplicity of more than one - think of tangential intersection. Doing an error analysis may be difficult here (when you compute a y value, there is some floating point error involved)
Interval Projected Polyhedron algorithm: This uses rounded interval arithmetic for robustness. But the algorithm for 2D Bezier curves gets quite complicated
There are a few cases you might come across:
Self intersections
Overlapping (coincident) curves: Subdivision algorithms will keep going in this case. This can be easy to check though.
Good luck :)
Assuming cubic beziers, the intersection points are real roots of a 9th degree polynomial. The existence of such roots within an interval (from negative to positive infinity for infinitely long curves, or 0 to 1 for your typical piecewise cubic beziers) can be checked robustly using a Sturm sequence. This will only work if we allow extending one of the curves to infinity. The algorithm will have no loops, and only use basic arithmetic operations (add, subtract and multiply, while division should be avoidable).
For maximum robustness, you could use arbitrary precision math. Since the number of steps is constant, the maximum possible number of digits in all temporary results is bounded. That way, your algorithm will always return the correct result, no matter how pathological the input (eg. curves barely touching).
It might be possible to use ordinary floating point first, and detect potential pathological cases (intermediate results becoming zero, when adding/subtracting previous intermediate results).
The formulas for getting the polynomial terms from Bezier control points are truly a sight to behold, but luckily you don't have to work them out,
they're right here on Github.
There's a thread from 2004, Closed-form of Bezier intersection test on comp.graphics.algorithms with more details.
If you're dealing with quadratic beziers, the polynomial will only be 4th degree.
An idea from the top of my head.
Map them, so they are in the same domain, such that you can subtract them. Then just do a root finding. There are lots and lots of numeric methods to root finding.
If you need to see if two curves intersect visually, say real-time screen graphics in a game or something, then the easiest way to do so, by far, is to simply compare their pixels.
Get the bbox and pixel lookup tables (LUTs) for both curves, check if there's bbox overlap: no overlap? done. There is no intersection. Overlap? sort the LUTs with a fast sorter, and then just run a compare. The moment you find a single match, you're done. There is an overlap, and you don't care where.
If you have to do this for lots of curves: use a library that does this for you, don't waste your time implementing it yourself, you're not going to be as efficient (for large collections things like oct-trees and scanline checks become far more efficient)
However, if you need to know if there is absolute, mathematically precise overlap, say for as-correct-as-possible design work, then you can't cut corners: actually run a real intersection detection algorithm to find all possible intersection points. Real-time is mostly irrelevant in this setting, you can spend the few more cycles to run a proper detection algorithm.
I have a set of 300.000 or so vectors which I would like to compare in some way, and given one vector I want to be able to find the closest vector I have thought of three methods.
Simple Euclidian distance
Cosine similarity
Use a kernel (for instance Gaussian) to calculate the Gram matrix.
Treat the vector as a discrete probability distribution (which makes
sense to do) and calculate some divergence measure.
I do not really understand when it is useful to do one rather than the other. My data has a lot of zero-elements. With that in mind, is there some general rule of thumbs as to which of the three methods is the best?
Sorry for the weak question, but I had to start somewhere...
Thank you!
Your question is not quite clear, are you looking for a distance metric between vectors, or an algorithm to efficiently find the nearest neighbour?
If your vectors just contain a numeric type such as doubles or integers, you can find a nearest neighbour efficiently using a structure such as the kd-tree. (since you are just looking at points in d-dimensional space). See http://en.wikipedia.org/wiki/Nearest_neighbor_search, for other methods.
Otherwise, choosing a distance metric and algorithm is very much dependent on the content of the vectors.
If your vectors are very sparse in nature and if they are binary, you can use Hamming or Hellinger distance. When your vector dimensions are large, avoid using Euclidean (refer http://en.wikipedia.org/wiki/Curse_of_dimensionality)
Please refer to http://citeseerx.ist.psu.edu/viewdoc/download?rep=rep1&type=pdf&doi=10.1.1.154.8446 for a survey of distance/similarity measures, although the paper limits it to pair of probability distributions.
I have a set points whose coordinates are given by the arrays x, y and z and the value of the density field in each point is stored in the array d.
I would like to reconstruct the density field on a uniform grid. What's the best algorithm to do that?
I know that in python, the scipy module come in handy with the griddata function but I would like to write my own code, I just need a hint.
If you have some sort of scalar field and the points are the origins of the field, you can implement a brute force approach by walking all lattice points and calculating the field intensity given the sources. There are both recursive methods that allow "blanking" wide volumes where the field is more or less constant, and techniques to save some CPU time by calculating the variations from one point to the next.
If the points you have are samplings of a value, then you will have to decompose your space in volumes and interpolate the values. You can employ a simple Voronoi decomposition - this is usually done in 2D for precipitation measurements - or a Delaunay tetrahedralization (you can look into TetGen's documentation). The first approach assumes that the function is constant throughout each Voronoi volume; the last allows rendering a trilinear interpolation.
If you need to smooth a 3D grid, the trilinear interpolation looks like the best approach.
There are also other methods used for fast visualization, that involve maintaining a list of 3D points in order of distance from any one given point in your regular grid. When moving through the grid, you recalculate distances using quadratic increments. Then, you perform a simple interpolation based on a subset of points of chosen cardinality (i.e., if you consider the four nearest points at distances d1..d4, you would calculate the value in P by proportionally weighing the values v1..v4). This approach is fast and easy to implement by yourself, but be warned that it underperforms wherever the minimum distance between points is less than the lattice step (you can compensate by considering more points where this happens; and the effect is less evident if the sampled function is smooth at the same scale).
If you want to implement a mathematical method yourself, you need to learn the theory, of course. In this case, it's 3D scattered data interpolation.
Wikipedia, MATLAB help and scipy help say there are at least half a dozen different methods. WP has a fairly good description of them and there's a comparison article but I strongly suggest you find something in your native language on such a terminology-intensive subject.
One approach is to form the Delaunay triangulation of the scattered points [x,y,z], (actually a tetrahedralisation in your 3d case!) and perform interpolation within each element using a linear representation of the density field, defined at the tetrahedron vertices.
To evaluate the density at each structured grid point you would (i) determine which tetrahedron the point lay within and (ii) evaluate the linear interpolant.
Forming the Delaunay triangulation is non-trivial, put there are a few good libraries that can be used for this, depending on your language of choice. One good option is CGAL.
Hope this helps.