Bounding errors in numerical calculations - precision

I have a theoretical math problem where I need to calculate a numerical value and bound its possible range. I can bound the numerical error for all the operations, but since some of them are fairly involved (Taylor expansions, elliptic integrals) I would expect a simple way to do it to already exist (In any language TBH, but python is always a plus).
Can you point me in the right direction?
Thanks!

Related

How to find accuracy of matrix multiplication with floating-point numbers?

I am trying to analyze how floating-point computation becomes more inaccurate when the data size decreases. In order to do that, I wanted to perform simple matrix operations on different variations of floating point representation, such as float64, float32, and float16. Since float64 computation will give the most precise and accurate result out of the three, I assume all float64 computation to give the expected result (i.e., error = 0).
The issue is that when I compare the calculated result with the expected result, I don't have an exact idea of how to quantify all the individual errors that I get into a single metric. I know about certain ways to go about it, such as finding the error mean, or the sum of square of errors (SSE), but I just wanted to know if there was a standard way of calculating the overall error of a given matrix computation.
Perhaps a variant of the condition number can be helpful? See here: https://en.wikipedia.org/wiki/Condition_number#Matrices
if there was a standard way of calculating the overall error of a given matrix computation.
Consider the case when a matrix is size 1. Then we are in a familiar 1 dimension domain.
How to compare y_computed_as_float vs y_expected? Even in this case, there is not a standard of how these should compare as floating point numbers. Subtract? Divide? It is often context sensitive. So "no" to OP's question.
Yet there are common practices. So a potential "yes" to OP question for select cases.
Floating point computations are often assessed by the difference between computed and math expected values scaled by the Unit in the last place*.
error = (y_computed_as_float - y_expected)/ulpf((float) y_expected);
For an N dimension matrix, the matrix error could use a root mean square of the N2 element errors.
* Scaling by ULP has some issues near each power of 2 and more near 0.0. There are ways to mitigate that, but we a getting into the weeds.

Finding optimal solution to multivariable function with non-negligible solution time?

So I have this issue where I have to find the best distribution that, when passed through a function, matches a known surface. I have written a script that creates the distribution given some parameters and spits out a metric that compares the given surface to the known, but this script takes a non-negligible time, so I can't just run through a very large set of parameters to find the optimal set of parameters. I looked into the simplex method, and it seems to be the right path, but its not quite what I need, because I dont exactly have a set of linear equations, and dont know the constraints for the parameters, but rather one method that gives a single output (an thats all). Can anyone point me in the right direction to how to solve this problem? Thanks!
To quickly go over my process / problem again, I have a set of parameters (at this point 2 but will be expanded to more later) that defines a distribution. This distribution is used to create a surface, which is compared to a known surface, and an error metric is produced. I want to find the optimal set of parameters, but cannot run through an arbitrarily large number of parameters due to the time constraint.
One situation consistent with what you have asked is a model in which you have a reasonably tractable probability distribution which generates an unknown value. This unknown value goes through a complex and not mathematically nice process and generates an observation. Your surface corresponds to the observed probability distribution on the observations. You would be happy finding the parameters that give a good least squares fit between the theoretical and real life surface distribution.
One approximation for the fitting process is that you compute a grid of values in the space output by the probability distribution. Each set of parameters gives you a probability for each point on this grid. The not nice process maps each grid point here to a nearest grid point in the space of the surface. The least squares fit is a quadratic in the probabilities calculated for the first grid, because the probabilities calculated for a grid point in the surface are the sums of the probabilities calculated for values in the first grid that map to something nearer to that point in the surface than any other point in the surface. This means that it has first (and even second) derivatives that you can calculate. If your probability distribution is nice enough you can use the chain rule to calculate derivatives for the least squares fit in the initial parameters. This means that you can use optimization methods to calculate the best fit parameters which require not just a means to calculate the function to be optimized but also its derivatives, and these are generally more efficient than optimization methods which require only function values, such as Nelder-Mead or Torczon Simplex. See e.g. http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/optim/package-summary.html.
Another possible approach is via something called the EM Algorithm. Here EM stands for Expectation-Maximization. It can be used for finding maximum likelihood fits in cases where the problem would be easy if you could see some hidden state that you cannot actually see. In this case the output produced by the initial distribution might be such a hidden state. One starting point is http://www-prima.imag.fr/jlc/Courses/2002/ENSI2.RNRF/EM-tutorial.pdf.

Theory on how to find the equation of a curve given a variable number of data points

I have recently started working on a project. One of the problems I ran into was converting changing accelerations into velocity. Accelerations at different points in time are provided through sensors. If you get the equation of these data points, the derivative of a certain time (x) on that equation will be the velocity.
I know how to do this on the computer, but how would I get the equation to start with? I have searched around but I have not found any existing programs that can form an equation given a set of points. In the past, I have created a neural net algorithm to form an equation, but it takes an incredibly long time to run.
If someone can link me a program or explain the process of doing this, that would be fantastic.
Sorry if this is in the wrong forum. I would post into math, but a programming background will be needed to know the realm of possibility of what a computer can do quickly.
This started out as a comment but ended up being too big.
Just to make sure you're familiar with the terminology...
Differentiation takes a function f(t) and spits out a new function f'(t) that tells you how f(t) changes with time (i.e. f'(t) gives the slope of f(t) at time t). This takes you from displacement to velocity or from velocity to acceleration.
Integreation takes a function f(t) and spits out a new function F(t) which measures the area under the function f(t) from the beginning of time up until a given point t. What's not obvious at first is that integration is actually the reverse of differentiation, a fact called the The Fundamental Theorem of Calculus. So integration takes you from acceleration to velocity or velocity to displacement.
You don't need to understand the rules of calculus to do numerical integration. The simplest (and most naive) method for integrating a function numerically is just by approximating the area by dividing it up into small slices between time points and summing the area of rectangles. This approximating sum is called a Reimann sum.
As you can see, this tends to really overshoot and undershoot certain parts of the function. A more accurate but still very simple method is the trapezoid rule, which also approximates the function with a series of slices, except the tops of the slices are straight lines between the function values rather than constant values.
Still more complicated, but yet a better approximation, is Simpson's rules, which approximates the function with parabolas between time points.
(source: tutorvista.com)
You can think of each of these methods as getting a better approximation of the integral because they each use more information about the function. The first method uses just one data point per area (a constant flat line), the second method uses two data points per area (a straight line), and the third method uses three data points per area (a parabola).
You could read up on the math behind these methods here or in the first page of this pdf.
I agree with the comments that numerical integration is probably what you want. In case you still want a function going through your data, let me further argue against doing that.
It's usually a bad idea to find a curve that goes exactly through some given points. In almost any applied math context you have to accept that there is a little noise in the inputs, and a curve going exactly through the points may be very sensitive to noise. This can produce garbage outputs. Finding a curve going exactly through a set of points is asking for overfitting to get a function that memorizes rather than understands the data, and does not generalize.
For example, take the points (0,0), (1,1), (2,4), (3,9), (4,16), (5,25), (6,36). These are seven points on y=x^2, which is fine. The value of x^2 at x=-1 is 1. Now what happens if you replace (3,9) with (2.9,9.1)? There is a sixth order polynomial passing through all 7 points,
4.66329x - 8.87063x^2 + 7.2281x^3 - 2.35108x^4 + 0.349747x^5 - 0.0194304x^6.
The value of this at x=-1 is -23.4823, very far from 1. While the curve looks ok between 0 and 2, in other examples you can see large oscillations between the data points.
Once you accept that you want an approximation, not a curve going exactly through the points, you have what is known as a regression problem. There are many types of regression. Typically, you choose a set of functions and a way to measure how well a function approximates the data. If you use a simple set of functions like lines (linear regression), you just find the best fit. If you use a more complicated family of functions, you should use regularization to penalize overly complicated functions such as high degree polynomials with large coefficients that memorize the data. If you either use a simple family or regularization, the function tends not to change much when you add or withhold a few data points, which indicates that it is a meaningful trend in the data.
Unfortunately, integrating accelerometer data to get velocity is a numerically unstable problem. For most applications, your error will diverge far too soon to get results of any practical value.
Recall that:
So:
However well you fit a function to your accelerometer data, you will still essentially be doing a piecewise interpolation of the underlying acceleration function:
Where the error terms from each integration will add!
Typically you will see wildly inaccurate results after just a few seconds.

Fast find of all local maximums in C++

Problem
I have a formula for calculation of 1D polynomial, joint function. I want to find all local maximums of that function within a given range.
My approach
My current solution is that i evaluate my function in a certain number of points from the range and then I go through these points and remember points where function changed from rising to decline. Of cause I can change number of samples within the interval, but I want to find all maximums with as lowest number of samples as possible.
Question
Can you suggest any effetive algorithm to me?
Finding all the maxima of an unknown function is hard. You can never be sure that a maximum you found is really just one maximum or that you have not overlooked a maximum somewhere.
However, if something is known about the function, you can try to exploit that. The simplest one is, of course, is if the function is known to be rational and bounded in grade. Up to a rational function of grade five it is possible to derive all four extrema from a closed formula, see http://en.wikipedia.org/wiki/Quartic_equation#General_formula_for_roots for details. Most likely, you don't want to implement that, but for linear, square, and cubic roots, the closed formula is feasible and can be used to find maxima of a quartic function.
That is only the most simple information that might be known, other interesting information is whether you can give a bound to the second derivative. This would allow you to reduce the sampling density when you find a strong slope.
You may also be able to exploit information from how you intend to use the maxima you found. It can give you clues about how much precision you need. Is it sufficient to know that a point is near a maximum? Or that a point is flat? Is it really a problem if a saddle point is classified as a maximum? Or if a maximum right next to a turning point is overlooked? And how much is the allowable error margin?
If you cannot exploit information like this, you are thrown back to sampling your function in small steps and hoping you don't make too much of an error.
Edit:
You mention in the comments that your function is in fact a kernel density estimation. This gives you at least the following information:
Unless the kernel is not limited in extend, your estimated function will be a piecewise function: Any point on it will only be influenced by a precisely calculable number of measurement points.
If the kernel is based on a rational function, the resulting estimated function will be piecewise rational. And it will be of the same grade as the kernel!
If the kernel is the uniform kernel, your estimated function will be a step function.
This case needs special handling because there won't be any maxima in the mathematical sense. However, it also makes your job really easy.
If the kernel is the triangular kernel, your estimated function will be a piecewise linear function.
If the kernel is the Epanechnikov kernel, your estimated function will be a piecewise quadratic function.
In all these cases it is next to trivial to produce the piecewise functions and to find their maxima.
If the kernel is of too high grade or transcendental, you still know the measurements that your estimation is based on, and you know the kernel properties. This allows you to derive a heuristic on how dense your maxima can get.
At the very least, you know the first and second derivative of the kernel.
In principle, this allows you to calculate the first and second derivative of the estimated function at any point.
In the case of a local kernel, it might be more prudent to calculate the first derivative and an upper bound to the second derivative of the estimated function at any point.
With this information, it should be possible to constrain the search to the regions where there are maxima and avoid oversampling of the slopes.
As you see, there is a lot of useful information that you can derive from the knowledge of your function, and which you can use to your advantage.
The local maxima are among the roots of the first derivative. To isolate those roots in your working interval you can use the Sturm theorem, and proceed by dichotomy. In theory (using exact arithmetic) it gives you all real roots.
An equivalent approach is to express your polynomial in the Bezier/Bernstein basis and look for changes of signs of the coefficients (hull property). Dichotomic search can be efficiently implemented by recursive subdivision of the Bezier.
There are several classical algorithms available for polynomials, such as Laguerre, that usually look for the complex roots as well.

Mathematica - Solving for the input of a taylor series such that coefficients are minimized

I need to find the value of a variable s such that the taylor expansion of an expression involving s:
Has a minimum (preferably zero, but due to binary minimum is sufficient) in as many coefficients other than 0th order as possible (preferably more than that one minimum coefficient, but 2nd and 3rd have priority).
reports the best n values of s that fulfill the condition within the region (ie show me the 3 best values of s and what the coefficients look like for each).
I have no idea how to even get the output of a Series[] command into any other mathematica command without receiving an error, much less how to actually solve the problem. The equation I am working with is too complex to post here (multi-regional but continuous polynomial expression that can be expanded). Does anyone know what commands to use for this?
The first thing you should realize is that the output of Series is not a sum but a a SeriesData object. To convert it into a sum you have to wrap it in Normal[Series[...]]. Since the question doesn't provide details, I can't say more.

Resources