Algorithm to approximate non-linear equation system solution - algorithm

I'm looking for an algorithm to approximate the solution of the following equation system:
The equations have to be solved on an embedded system, in C++.
Background:
We measure the 2 variables X_m and Y_m, so they are known
We want to compute the real values: X_r and Y_r
X and Y are real numbers
We measure the functions f_xy and f_yx during calibration. We have maximal 18 points of each function.
It's possible to store the functions as a look-up table
I tried to approximate the functions with 2nd order polynomials and compute the solution, but it was not accurate enough, because of the fitting error.
I am looking for an algorithm to approximate the results in an embedded system in C++, but I don't even know what to search for. I found some papers on the theory link, but I think there must be an easier way to do it in my case.
Also: how can I determine during calibration, whether the functions can be solved with the algorithm?

Fitting a second-order polynomial through f_xy? That's generally not viable. The go-to solution would be Runga-Kutta interpolation. You pick two known values left and two to the right of your argument, with weights 1,2,2,1. This gets you an estimate d(f_xy)/dx which you can then use for interpolation.

The normal way is by Newton's iterations, starting from the initial approximation (Xm, Ym) [assuming that the f are mere corrections]. Due to the particular shape of the equations, you can reduce to twice a single equation in a single unknown.
Xr = Xm - Fyx(Ym - Fxy(Xr))
Yr = Ym - Fxy(Xm - Fyx(Yr))
The iterations read
Xr <-- Xr - (Xm - Fyx(Ym - Fxy(Xr))) / (1 + Fxy'(Ym - Fxy(Xr)).Fxy'(Xr))
Yr <-- Yr - (Ym - Fxy(Xm - Fyx(Yr))) / (1 + Fyx'(Xm - Fyx(Yr)).Fyx'(Yr))
So you should tabulate the derivatives of f as well, though accuracy is not so critical than for the computation of the f themselves.
If the calibration points aren't too noisy, I would recommend cubic spline interpolation, for which you can precompute all coefficients. At the same time these coefficients allow you to estimate the derivative (as the corresponding quadratic interpolant, which is continuous).
In principle (unless the points are uniformly spaced), you need to perform a dichotomic search to determine the interval in which the argument lies. But here you will evaluate the functions at nearby values, so that a linear search from the previous location should be better.
A different way to address the problem is by considering the bivariate solution surfaces Xr = G(Xm, Ym) and Yr = G(Xm, Ym) that you compute on a grid of points. If the surfaces are smooth enough, you can use a coarse grid.
So by any method (such as the one above), you precompute the solutions at each grid node, as well as the coefficients of some interpolant in the X and Y directions. I recommend a cubic spline, again.
Now to interpolate inside a grid cell, you combine the two univarite interpolants to a bivariate one by means of the Coons formula https://en.wikipedia.org/wiki/Coons_patch.

Related

optimize integral f(x)exp(-x) from x=0,infinity

I need a robust integration algorithm for f(x)exp(-x) between x=0 and infinity, with f(x) a positive, differentiable function.
I do not know the array x a priori (it's an intermediate output of my routine). The x array is typically ~log-equispaced, but highly irregular.
Currently, I'm using the Simpson algorithm, buy my problem is that often the domain is highly undersampled by the x array, which produces unrealistic values for the integral.
On each run of my code I need to do this integration thousands of times (each with a different set of x values), so I need to find an efficient and robust way to integrate this function.
More details:
The x array can have between 2 and N points (N known). The first value is always x[0] = 0.0. The last point is always a value greater than a tunable threshold x_max (such that exp(x_max) approx 0). I only know the values of f at the points x[i] (though the function is a smooth function).
My first idea was to do a Laguerre-Gauss quadrature integration. However, this algorithm seems to be highly unreliable when one does not use the optimal quadrature points.
My current idea is to add a set of auxiliary points, interpolating f, such that the Simpson algorithm becomes more stable. If I do this, is there an optimal selection of auxiliary points?
I'd appreciate any advice,
Thanks.
Set t=1-exp(-x), then dt = exp(-x) dx and the integral value is equal to
integral[ f(-log(1-t)) , t=0..1 ]
which you can evaluate with the standard Simpson formula and hopefully get good results.
Note that piecewise linear interpolation will always result in an order 2 error for the integral, as the result amounts to a trapezoid formula even if the method was Simpson. For better errors in the Simpson method you will need higher interpolation degrees, ideally cubic splines. Cubic Bezier polynomials with estimated derivatives to compute the control points could be a fast compromise.

How is the decision boundary piloted after the parameters theta are updated

I have been learning about Machine learning algorithms this semester but I cant seem to understand how the parameters theta are used once Gradient decent is ran and they are updated, specifically in Logistic regression, In short my question is how is the decision boundary piloted after the parameters theta are updated.
After you use gradient descent to estimate your parameters theta, you can use those calculated parameters to make predictions.
For any input x, you can now calculate an predicted outcome y.
Ultimately the goal of machine learning is to make predictions.
So you take a whole bunch of observations x and y. Where x is your input and y is your output. In case of logistic regression, y is one of two values. For example, take a bunch of emails (x) that are labeled spam or no spam (y is 1 for spam and 0 for no spam). Or take a bunch of medical images that are labeled healthy or non healthy. ...
Feed all that data in your machine learning algorithm. Your algorithm (gradient descent for example), will calculate the theta coefficients.
Now you can use these theta coefficient to make predictions for new values of x. For example a new email that the system has never seen, using the theta coefficient, you can predict whether it is spam or not.
As far a plotting the decision boundary. This is probably feasible when you have two dimensions for x. You can have one dimension on each axis. And the resulting dots in your graph would be your y values. You could color them differently or show a different shape whether the result is one way or the other (i.e. your y is 0 or 1).
In practicality, these plots are useful during a lecture to get a general gist of what you're trying to do or accomplish. In reality, every input X would probably be a vector of many values (way more than 2). And thus it becomes impossible to plot a decision boundary.
Typically, logistic regression is parametrized in a following way:
cl(x|theta) = 1 / (1 + exp(-SUM_{i=1}^d theta_i x_i + theta_0 )) ) > 0.5
which is equivalent to
cl(x|theta) = sign(SUM_{i=1}^d theta_i x_i + theta_0 )
so once you get your theta, you use it to make a prediction by computing a simple weighted sum of your data representation and you check the sign of such number.

Constrained random solution of an underspecified system of linear equations

Premise
I've a system of linear equations
dot(A,x) = y
whose solutions have many degrees of freedom: indeed the Number of linearly independent Equations (E) is less than the dimension of x, A.K.A. the Number of Variables (N).
The number of degrees of freedom left constrains the solutions to be a hyperplane N-E of the overall space R^N. Given the (unimportant) characteristics of A, I am always able to write the solutions x (a vector N x 1) as
x=dot(B,t)+q
where B is a N x (N-E) matrix, t a (N-E) x 1 vector and q a N x 1 vector. This define the hyperplane of the solutions of my original problem, A x = y in parametric form.
I need to extract a random solution, with uniform probability over any possible point of the hyperplane, such that all x are positive (we will refer to it as a positive solution). Note that, for the specific problem I am dealing with, the space of positive solutions of x exists and it is bounded (that's how the notion of uniform probability is reasonable for the specific case, to clarify as suggested by #Petr comment). In the beginning, once I was able to write x=Bt+q, I thought it extremely simple. Now I am starting to doubt it.
Proposed Solution
By now I do something like this:
For each dimension i in range(N-E) I compute the maximum and minimum value of t[i]: t_min[i] and t_max[i]. Intervals big enough to not exclude any possible positive solution. Those are algebraically computed, always existing and defining a limited space.
I extract N-E uniform random values t[i], each comprised between t_min [i] and t_max[i].
I compute x = dot(B,t)+q
If all x[j] are positives, accept the solution. If some x[j] is negative, go back to point 2.
An example is visible for a two dimensional space N-E in the next figure.
Caption: A problem in N dimension reduced to a N-E=2 space. The yellow diamond is the space of positive solutions of the N-dimensional problem. I randomly sample points in the orange box between (t1(min),t2(min)) and (t1(max),t2(max)) until I find a point in the yellow box.
I think it is a good enough solution, but...
Problem
When N-E is big, the space of the hyperparallelogram bounded inside the hypercube can be small. In general it will be small^(N-E), that can be very small. How small?
While for sure an infinite number of positive solutions to the original problem exist, the space of the solutions can have measure zero in the N-E dimensional space. This can happen if all the positive solutions of the original problem have one dimension of x = 0. The borders of a diamond will make contact, transforming the diamond of solutions to a line. Of course you will never randomly pick EXACTLY a line in 2D, let alone in 5D.
A obvious idea would be to further reduce the dimensionality from N-E to a smaller number, i.e. to extract directly points from the aforementioned line instead of the square. Algebra is not easy, but I'm working on it. I'm not positive I will be able to solve it.
Note that choosing first one dimension (for example t1), computing the new limits of t2 conditional to the value of t1 extracted and then extract a possible value of t2 in this boundary, while much faster, does not give a uniform probability among all the possible solutions.
I know that the problem is very specific, but even some general ideas or thoughts would be gladly received. I am doubtful if there is some computing technique to extract directly the solution in the diamond...

Math - 3d positioning/multilateration

I have a problem involving 3d positioning - sort of like GPS. Given a set of known 3d coordinates (x,y,z) and their distances d from an unknown point, I want to find the unknown point. There can be any number of reference points, however there will be at least four.
So, for example, points are in the format (x,y,z,d). I might have:
(1,0,0,1)
(0,2,0,2)
(0,0,3,3)
(0,3,4,5)
And here the unknown point would be (0,0,0,0).
What would be the best way to go about this? Is there an existing library that supports 3d multilateration? (I have been unable to find one). Since it's unlikely that my data will have an exact solution (all of the 4+ spheres probably won't have a single perfect intersect point), the algorithm would need to be capable of approximating it.
So far, I was thinking of taking each subset of three points, triangulating the unknown based on those three, and then averaging all of the results. Is there a better way to do this?
You could take a non-linear optimisation approach, by defining a "cost" function that incorporates the distance error from each of your observation points.
Setting the unknown point at (x,y,z), and considering a set of N observation points (xi,yi,zi,di) the following function could be used to characterise the total distance error:
C(x,y,z) = sum( ((x-xi)^2 + (y-yi)^2 + (z-zi)^2 - di^2)^2 )
^^^
^^^ for all observation points i = 1 to N
This is the sum of the squared distance errors for all points in the set. (It's actually based on the error in the squared distance, so that there are no square roots!)
When this function is at a minimum the target point (x,y,z) would be at an optimal position. If the solution gives C(x,y,z) = 0 all observations would be exactly satisfied.
One apporach to minimise this type of equation would be Newton's method. You'd have to provide an initial starting point for the iteration - possibly a mean value of the observation points (if they en-circle (x,y,z)) or possibly an initial triangulated value from any three observations.
Edit: Newton's method is an iterative algorithm that can be used for optimisation. A simple version would work along these lines:
H(X(k)) * dX = G(X(k)); // solve a system of linear equations for the
// increment dX in the solution vector X
X(k+1) = X(k) - dX; // update the solution vector by dX
The G(X(k)) denotes the gradient vector evaluated at X(k), in this case:
G(X(k)) = [dC/dx
dC/dy
dC/dz]
The H(X(k)) denotes the Hessian matrix evaluated at X(k), in this case the symmetric 3x3 matrix:
H(X(k)) = [d^2C/dx^2 d^2C/dxdy d^2C/dxdz
d^2C/dydx d^2C/dy^2 d^2C/dydz
d^2C/dzdx d^2C/dzdy d^2C/dz^2]
You should be able to differentiate the cost function analytically, and therefore end up with analytical expressions for G,H.
Another approach - if you don't like derivatives - is to approximate G,H numerically using finite differences.
Hope this helps.
Non-linear solution procedures are not required. You can easily linearise the system. If you take pair-wise differences
$(x-x_i)^2-(x-x_j)^2+(y-y_i)^2-(y-y_j)^2+(z-z_i)^2-(z-z_j)^2=d_i^2-d_j^2$
then a bit of algebra yields the linear equations
$(x_i-x_j) x +(y_i-y_j) y +(zi-zj) z=-1/2(d_i^2-d_j^2+ds_i^2-ds_j^2)$,
where $ds_i$ is the distance from the $i^{th}$ sensor to the origin. These are the equations of the planes defined by intersecting the $i^{th}$ and the $j^{th}$ spheres.
For four sensors you obtain an over-determined linear system with $4 choose 2 = 6$ equations. If $A$ is the resulting matrix and $b$ the corresponding vector of RHS, then you can solve the normal equations
$A^T A r = A^T b$
for the position vector $r$. This will work as long as your sensors are not coplanar.
If you can spend the time, an iterative solution should approach the correct solution pretty quickly. So pick any point the correct distance from site A, then go round the set working out the distance to the point then adjusting the point so that it's in the same direction from the site but the correct distance. Continue until your required precision is met (or until the point is no longer moving far enough in each iteration that it can meet your precision, as per the possible effects of approximate input data).
For an analytic approach, I can't think of anything better than what you already propose.

How do Trigonometric functions work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
So in high school math, and probably college, we are taught how to use trig functions, what they do, and what kinds of problems they solve. But they have always been presented to me as a black box. If you need the Sine or Cosine of something, you hit the sin or cos button on your calculator and you're set. Which is fine.
What I'm wondering is how trigonometric functions are typically implemented.
First, you have to do some sort of range reduction. Trig functions are periodic, so you need to reduce arguments down to a standard interval. For starters, you could reduce angles to be between 0 and 360 degrees. But by using a few identities, you realize you could get by with less. If you calculate sines and cosines for angles between 0 and 45 degrees, you can bootstrap your way to calculating all trig functions for all angles.
Once you've reduced your argument, most chips use a CORDIC algorithm to compute the sines and cosines. You may hear people say that computers use Taylor series. That sounds reasonable, but it's not true. The CORDIC algorithms are much better suited to efficient hardware implementation. (Software libraries may use Taylor series, say on hardware that doesn't support trig functions.) There may be some additional processing, using the CORDIC algorithm to get fairly good answers but then doing something else to improve accuracy.
There are some refinements to the above. For example, for very small angles theta (in radians), sin(theta) = theta to all the precision you have, so it's more efficient to simply return theta than to use some other algorithm. So in practice there is a lot of special case logic to squeeze out all the performance and accuracy possible. Chips with smaller markets may not go to as much optimization effort.
edit: Jack Ganssle has a decent discussion in his book on embedded systems, "The Firmware Handbook".
FYI: If you have accuracy and performance constraints, Taylor series should not be used to approximate functions for numerical purposes. (Save them for your Calculus courses.) They make use of the analyticity of a function at a single point, e.g. the fact that all its derivatives exist at that point. They don't necessarily converge in the interval of interest. Often they do a lousy job of distributing the function approximation's accuracy in order to be "perfect" right near the evaluation point; the error generally zooms upwards as you get away from it. And if you have a function with any noncontinuous derivative (e.g. square waves, triangle waves, and their integrals), a Taylor series will give you the wrong answer.
The best "easy" solution, when using a polynomial of maximum degree N to approximate a given function f(x) over an interval x0 < x < x1, is from Chebyshev approximation; see Numerical Recipes for a good discussion. Note that the Tj(x) and Tk(x) in the Wolfram article I linked to used the cos and inverse cosine, these are polynomials and in practice you use a recurrence formula to get the coefficients. Again, see Numerical Recipes.
edit: Wikipedia has a semi-decent article on approximation theory. One of the sources they cite (Hart, "Computer Approximations") is out of print (& used copies tend to be expensive) but goes into a lot of detail about stuff like this. (Jack Ganssle mentions this in issue 39 of his newsletter The Embedded Muse.)
edit 2: Here's some tangible error metrics (see below) for Taylor vs. Chebyshev for sin(x). Some important points to note:
that the maximum error of a Taylor series approximation over a given range, is much larger than the maximum error of a Chebyshev approximation of the same degree. (For about the same error, you can get away with one fewer term with Chebyshev, which means faster performance)
Range reduction is a huge win. This is because the contribution of higher order polynomials shrinks down when the interval of the approximation is smaller.
If you can't get away with range reduction, your coefficients need to be stored with more precision.
Don't get me wrong: Taylor series will work properly for sine/cosine (with reasonable precision for the range -pi/2 to +pi/2; technically, with enough terms, you can reach any desired precision for all real inputs, but try to calculate cos(100) using Taylor series and you can't do it unless you use arbitrary-precision arithmetic). If I were stuck on a desert island with a nonscientific calculator, and I needed to calculate sine and cosine, I would probably use Taylor series since the coefficients are easy to remember. But the real world applications for having to write your own sin() or cos() functions are rare enough that you'd be best off using an efficient implementation to reach a desired accuracy -- which the Taylor series is not.
Range = -pi/2 to +pi/2, degree 5 (3 terms)
Taylor: max error around 4.5e-3, f(x) = x-x3/6+x5/120
Chebyshev: max error around 7e-5, f(x) = 0.9996949x-0.1656700x3+0.0075134x5
Range = -pi/2 to +pi/2, degree 7 (4 terms)
Taylor: max error around 1.5e-4, f(x) = x-x3/6+x5/120-x7/5040
Chebyshev: max error around 6e-7, f(x) = 0.99999660x-0.16664824x3+0.00830629x5-0.00018363x7
Range = -pi/4 to +pi/4, degree 3 (2 terms)
Taylor: max error around 2.5e-3, f(x) = x-x3/6
Chebyshev: max error around 1.5e-4, f(x) = 0.999x-0.1603x3
Range = -pi/4 to +pi/4, degree 5 (3 terms)
Taylor: max error around 3.5e-5, f(x) = x-x3/6+x5
Chebyshev: max error around 6e-7, f(x) = 0.999995x-0.1666016x3+0.0081215x5
Range = -pi/4 to +pi/4, degree 7 (4 terms)
Taylor: max error around 3e-7, f(x) = x-x3/6+x5/120-x7/5040
Chebyshev: max error around 1.2e-9, f(x) = 0.999999986x-0.166666367x3+0.008331584x5-0.000194621x7
I believe they're calculated using Taylor Series or CORDIC. Some applications which make heavy use of trig functions (games, graphics) construct trig tables when they start up so they can just look up values rather than recalculating them over and over.
Check out the Wikipedia article on trig functions. A good place to learn about actually implementing them in code is Numerical Recipes.
I'm not much of a mathematician, but my understanding of where sin, cos, and tan "come from" is that they are, in a sense, observed when you're working with right-angle triangles. If you take measurements of the lengths of sides of a bunch of different right-angle triangles and plot the points on a graph, you can get sin, cos, and tan out of that. As Harper Shelby points out, the functions are simply defined as properties of right-angle triangles.
A more sophisticated understanding is achieved by understanding how these ratios relate to the geometry of circle, which leads to radians and all of that goodness. It's all there in the Wikipedia entry.
Most commonly for computers, power series representation is used to calculate sines and cosines and these are used for other trig functions. Expanding these series out to about 8 terms computes the values needed to an accuracy close to the machine epsilon (smallest non-zero floating point number that can be held).
The CORDIC method is faster since it is implemented on hardware, but it is primarily used for embedded systems and not standard computers.
I would like to extend the answer provided by #Jason S. Using a domain subdivision method similar to that described by #Jason S and using Maclaurin series approximations, an average (2-3)X speedup over the tan(), sin(), cos(), atan(), asin(), and acos() functions built into the gcc compiler with -O3 optimization was achieved. The best Maclaurin series approximating functions described below achieved double precision accuracy.
For the tan(), sin(), and cos() functions, and for simplicity, an overlapping 0 to 2pi+pi/80 domain was divided into 81 equal intervals with "anchor points" at pi/80, 3pi/80, ..., 161pi/80. Then tan(), sin(), and cos() of these 81 anchor points were evaluated and stored. With the help of trig identities, a single Maclaurin series function was developed for each trig function. Any angle between ±infinity may be submitted to the trig approximating functions because the functions first translate the input angle to the 0 to 2pi domain. This translation overhead is included in the approximation overhead.
Similar methods were developed for the atan(), asin(), and acos() functions, where an overlapping -1.0 to 1.1 domain was divided into 21 equal intervals with anchor points at -19/20, -17/20, ..., 19/20, 21/20. Then only atan() of these 21 anchor points was stored. Again, with the help of inverse trig identities, a single Maclaurin series function was developed for the atan() function. Results of the atan() function were then used to approximate asin() and acos().
Since all inverse trig approximating functions are based on the atan() approximating function, any double-precision argument input value is allowed. However the argument input to the asin() and acos() approximating functions is truncated to the ±1 domain because any value outside it is meaningless.
To test the approximating functions, a billion random function evaluations were forced to be evaluated (that is, the -O3 optimizing compiler was not allowed to bypass evaluating something because some computed result would not be used.) To remove the bias of evaluating a billion random numbers and processing the results, the cost of a run without evaluating any trig or inverse trig function was performed first. This bias was then subtracted off each test to obtain a more representative approximation of actual function evaluation time.
Table 2. Time spent in seconds executing the indicated function or functions one billion times. The estimates are obtained by subtracting the time cost of evaluating one billion random numbers shown in the first row of Table 1 from the remaining rows in Table 1.
Time spent in tan(): 18.0515 18.2545
Time spent in TAN3(): 5.93853 6.02349
Time spent in TAN4(): 6.72216 6.99134
Time spent in sin() and cos(): 19.4052 19.4311
Time spent in SINCOS3(): 7.85564 7.92844
Time spent in SINCOS4(): 9.36672 9.57946
Time spent in atan(): 15.7160 15.6599
Time spent in ATAN1(): 6.47800 6.55230
Time spent in ATAN2(): 7.26730 7.24885
Time spent in ATAN3(): 8.15299 8.21284
Time spent in asin() and acos(): 36.8833 36.9496
Time spent in ASINCOS1(): 10.1655 9.78479
Time spent in ASINCOS2(): 10.6236 10.6000
Time spent in ASINCOS3(): 12.8430 12.0707
(In the interest of saving space, Table 1 is not shown.) Table 2 shows the results of two separate runs of a billion evaluations of each approximating function. The first column is the first run and the second column is the second run. The numbers '1', '2', '3' or '4' in the function names indicate the number of terms used in the Maclaurin series function to evaluate the particular trig or inverse trig approximation. SINCOS#() means that both sin and cos were evaluated at the same time. Likewise, ASINCOS#() means both asin and acos were evaluated at the same time. There is little extra overhead in evaluating both quantities at the same time.
The results show that increasing the number of terms slightly increases execution time as would be expected. Even the smallest number of terms gave around 12-14 digit accuracy everywhere except for the tan() approximation near where its value approaches ±infinity. One would expect even the tan() function to have problems there.
Similar results were obtained on a high-end MacBook Pro laptop in Unix and on a high-end desktop computer in Linux.
If your asking for a more physical explanation of sin, cos, and tan consider how they relate to right-angle triangles. The actual numeric value of cos(lambda) can be found by forming a right-angle triangle with one of the angles being lambda and dividing the length of the triangles side adjacent to lambda by the length of the hypotenuse. Similarily for sin use the opposite side divided by the hypotenuse. For tangent use the opposite side divided by the adjacent side. The classic memonic to remember this is SOHCAHTOA (pronounced socatoa).

Resources