Obtaining the functional form of a curve - algorithm

The following is the plot of a curve f(r), where r is the radial coordinate, and plotted for different values of a parameter as shown:
However, I don't know the functional form of the curve and I am interested to find the same. Are there any numerical methods which can be used to find the functional form of f(r) in terms of the radial coordinate and the parameter?

I had found a solution of the problem based on the suggestion by ja72 to use the Eureqa software which churns through the data to create accurate predictive models using evolutionary search algorithm.
In the question, the different curves corresponds to different values of . So, initially I obtained the best fit equation for different values of and found that the following model equation is suitable for my purpose:
Then, I repeated the process for a large number of values of and calculated the values of the four functions for different values of and then individually fitted these four functions. The following are the results that I obtained:
N.B.: Eureqa gave several other better fitting formulas than those mentioned in the answer. But the formulas that I mentioned are sufficiently accurate for my purpose and have minimum complexity.

A blind curve fit without an underlying model is a dangerous thing.
You need to have an understanding of the physical model behind the data to create a successful fit. The reason is that if r is distance and the best fit curve uses r^0.4072 for example, that dimension raised to a decimal power bears no meaning and it hides any underlying assumptions.Like some other dimension l not included in the model, whereas only the dimensionless quantity (r/l) would make sense to raise to the decimal power.
From a function analysis standpoint
These curves are not the result of any standard math function. Well I am not that familiar with bessel functions, gamma functions and legendre polynomials. But none of the standard functions you find in a scientific calculator jumps out here.
If r is assumed to be dimensionless, then you try to match the asymptotic behavior when r -> 0 and when r -> ∞. The would be the baseline curve. To me it does not look hyperbolic, but rather close to 1/LN(1+r).
So change the variables make g=1/LN(1+r) and plot f(r) against g(r) and see what that looks like. Then try another round of curve fitting in the new curves ... and so on.
Nobody can answer this question
Nobody else could effectively answer this question but you, because a) you have the data, and b) you need to make assumptions about what region is important or not, and what is acceptable deviation.

Related

What is the exact difference between a model and an algorithm?

What is the exact difference between a model and an algorithm?
Let us take as an example logistic regression. Is logistic regression an model or an algorithm, and why?
An algorithm is the general approach you will take. The model is what you get when you run the algorithm over your training data and what you use to make predictions on new data.
You can generate a new model with the same algorithm but with different data, or you can get a new model from the same data but with a different algorithm.
Do you like Ferrari? They have a very nice 812 Superfast model, but they also have other models. Every model is different and leads to a different behavior and experience.
Think of a model more like a mathematical description of a system. An equation that gives you a general way how to achieve your vision or an idea. For example:
is a model function that yields a straight line (see least squares linear regression).
Whereas an algorithm is a set of actions (or rules) that you need to perform in order to implement your vision. For example, the famous minimax algorithm often used in AI game players that have to choose the next move.
To finish my above idea, imagine that a Ferrari model is an already existing idea on a paper and an algorithm is a robot in a factory that performs its set of programmed actions. It is sequence of actions. This is naively speaking of course, but hopefully you get the idea.
An algorithm is a mathematical formula like linear regression for example. Linear regression (with one variable) defines a line in 2-D space. But the slope and position of the line cannot be determined unless some sample values are available to solve the equation.
This regression line can be represented mathematically as y = mx + a.
Once sample values (or training data) is applied to solve this equation, the line can be drawn in 2-D space.
This line now becomes the model with known slope (m) and intercept (a). Using this model, the value of y (label) can be determined for a given value of x (feature).

Theory on how to find the equation of a curve given a variable number of data points

I have recently started working on a project. One of the problems I ran into was converting changing accelerations into velocity. Accelerations at different points in time are provided through sensors. If you get the equation of these data points, the derivative of a certain time (x) on that equation will be the velocity.
I know how to do this on the computer, but how would I get the equation to start with? I have searched around but I have not found any existing programs that can form an equation given a set of points. In the past, I have created a neural net algorithm to form an equation, but it takes an incredibly long time to run.
If someone can link me a program or explain the process of doing this, that would be fantastic.
Sorry if this is in the wrong forum. I would post into math, but a programming background will be needed to know the realm of possibility of what a computer can do quickly.
This started out as a comment but ended up being too big.
Just to make sure you're familiar with the terminology...
Differentiation takes a function f(t) and spits out a new function f'(t) that tells you how f(t) changes with time (i.e. f'(t) gives the slope of f(t) at time t). This takes you from displacement to velocity or from velocity to acceleration.
Integreation takes a function f(t) and spits out a new function F(t) which measures the area under the function f(t) from the beginning of time up until a given point t. What's not obvious at first is that integration is actually the reverse of differentiation, a fact called the The Fundamental Theorem of Calculus. So integration takes you from acceleration to velocity or velocity to displacement.
You don't need to understand the rules of calculus to do numerical integration. The simplest (and most naive) method for integrating a function numerically is just by approximating the area by dividing it up into small slices between time points and summing the area of rectangles. This approximating sum is called a Reimann sum.
As you can see, this tends to really overshoot and undershoot certain parts of the function. A more accurate but still very simple method is the trapezoid rule, which also approximates the function with a series of slices, except the tops of the slices are straight lines between the function values rather than constant values.
Still more complicated, but yet a better approximation, is Simpson's rules, which approximates the function with parabolas between time points.
(source: tutorvista.com)
You can think of each of these methods as getting a better approximation of the integral because they each use more information about the function. The first method uses just one data point per area (a constant flat line), the second method uses two data points per area (a straight line), and the third method uses three data points per area (a parabola).
You could read up on the math behind these methods here or in the first page of this pdf.
I agree with the comments that numerical integration is probably what you want. In case you still want a function going through your data, let me further argue against doing that.
It's usually a bad idea to find a curve that goes exactly through some given points. In almost any applied math context you have to accept that there is a little noise in the inputs, and a curve going exactly through the points may be very sensitive to noise. This can produce garbage outputs. Finding a curve going exactly through a set of points is asking for overfitting to get a function that memorizes rather than understands the data, and does not generalize.
For example, take the points (0,0), (1,1), (2,4), (3,9), (4,16), (5,25), (6,36). These are seven points on y=x^2, which is fine. The value of x^2 at x=-1 is 1. Now what happens if you replace (3,9) with (2.9,9.1)? There is a sixth order polynomial passing through all 7 points,
4.66329x - 8.87063x^2 + 7.2281x^3 - 2.35108x^4 + 0.349747x^5 - 0.0194304x^6.
The value of this at x=-1 is -23.4823, very far from 1. While the curve looks ok between 0 and 2, in other examples you can see large oscillations between the data points.
Once you accept that you want an approximation, not a curve going exactly through the points, you have what is known as a regression problem. There are many types of regression. Typically, you choose a set of functions and a way to measure how well a function approximates the data. If you use a simple set of functions like lines (linear regression), you just find the best fit. If you use a more complicated family of functions, you should use regularization to penalize overly complicated functions such as high degree polynomials with large coefficients that memorize the data. If you either use a simple family or regularization, the function tends not to change much when you add or withhold a few data points, which indicates that it is a meaningful trend in the data.
Unfortunately, integrating accelerometer data to get velocity is a numerically unstable problem. For most applications, your error will diverge far too soon to get results of any practical value.
Recall that:
So:
However well you fit a function to your accelerometer data, you will still essentially be doing a piecewise interpolation of the underlying acceleration function:
Where the error terms from each integration will add!
Typically you will see wildly inaccurate results after just a few seconds.

Is it possible to calculate the mathematical function of a 2D image?

The question basically says it all. I would like to add that lets suppose I have an image, a photograph and I wish to calculate its mathematical function, so that when I input x and y pixel value, it returns a vector consisting of R,G,B values at that x,y point. Therefore I can use a for loop to construct the whole image by just that function. I am not asking about the whole solution or algorithm here, but just that if this thing is possible, which direction should I take to go about doing this. Reference to relevant papers would be really nice.
Thanks
Azmuh
Yes, it is absolutely always possible. Basically, if you choose some points, there is always (an infinity of) smooth explicit functions (that is nice functions) which value on the points is exactly the one you choose.
For example, you can have a look at http://en.wikipedia.org/wiki/Lagrange_polynomial or http://en.wikipedia.org/wiki/Trigonometric_interpolation. They are two different methods to compute an explicit function which pass exactly by the data points you have. So you can apply those methods for your image, seen as a set of data points, and separately for R, G, and B.
At the end, you get one simple function explicitly (a polynomial or a trigonometric series, depending on what you chose), and you can compute its values where you want.
However, note that I would definitely not recommend to use those methods to effectively retrieve the data. Indeed, the functions you get are absolutely not optimized (that is with a veeeery high degree (for a n×m image, each color will have a degree nm-1), very high coefficients) and furthermore will have extremely large values between your original points (look for Runge's phenomenon).
This is not possible in general... Imagine an image that has been generated by random values for each pixel. You can't find a mathematical expression that will give you the value of a pixel given its 2d coordinates.
Now it may be possible for some images that have been generated using a function. In that case, it's not a problem specific to image processing, it's get back the function from some points of the function (in your case, you have all the points). It's exactly the same thing as extrapolating a curve from a set of points when you trace a graph in excel. The more points you have, the more precise the function you wind will be.
Look for information about Regression analysis. I can't help you much but there are some algorithms that exist.

why overfitting gives a bad hypothesis function

In linear or logistic regression if we find a hypothesis function which fits the training set perfectly then it should be a good thing because in that case we have used 100 % of the information given to predict new information.
While it is called to be overfitting and said to be bad thing.
By making the hypothesis function simpler we may be actually increasing the noise instead of decreasing it.
Why is it so?
Overfitting occurs when you try "too hard" to make the examples in the training set fit the classification rule.
It is considered bad thing for 2 reasons main reasons:
The data might have noise. Trying too hard to classify 100% of the examples correctly, will make the noise count, and give you a bad rule while ignoring this noise - would usually be much better.
Remember that the classified training set is just a sample of the real data. This solution is usually more complex than what you would have got if you tolerated a few wrongly classified samples. According to Occam's Razor, you should prefer the simpler solution, so ignoring some of the samples, will be better,
Example:
According to Occam's razor, you should tolerate the misclassified sample, and assume it is noise or insignificant, and adopt the simple solution (green line) in this data set:
Because you actually didn't "learn" anything from your training set, you've just fitted to your data.
Imagine, you have a one-dimensional regression
x_1 -> y_1
...
x_n -> y_1
The function, defined this way
y_n, if x = x_n
f(x)=
0, otherwise
will give you perfect fit, but it's actually useless.
Hope, this helped a bit:)
Assuming that your regression accounts for all source of deviation in your data, then you might argue that your regression perfectly fits the data. However, if you know all (and I mean all) of the influences in your system, then you probably don't need a regression. You likely have an analytic solution that perfectly predicts new information.
In actuality, the information you possess will fall short of this perfect level. Noise (measurement error, partial observability, etc) will cause deviation in your data. In response, a regression (or other fitting mechanism) should seek the general trend of the data while minimizing the influence of noise.
Actually, the statement is not quite correct as written. It is perfectly fine to match 100% of your data if your hypothesis function is linear. Every continuous nonlinear function may be approximated locally by a linear function which gives important information on it's local behavior.
It is also fine to match 100 points of data to a quadratic curve if that data matches 100%. You can have high confidence that you are not overfitting your data, since the data consistently shows quadratic behavior.
However, one can always get 100% fit by using a polynomial function of high enough degree. Even without the noise that others have pointed out, though, you shouldn't assume your data has some high degree polynomial behavior without having some kind of theoretical or experimental confirmation of that hypothesis. Two good indicators that polynomial behavior is indicated are:
You have some theoretical reason for expecting the data to grow as x^n in one of the directional limits.
You have data that has been supporting a fixed degree polynomial fit as more and more data has been collected.
Notice, though, that even though exponential and reciprocal relationships may have data that fits a polynomial of high enough degree, they don't tend to obey eith of the two conditions above.
The point is that your data fit needs to be useful to prediction. You always know that a linear fit will give information locally, but that information becomes more useful the more points are fit. Even if there are only two points and noise, a linear fit still gives the best theoretical look at the data collected so far, and establishes the first expectations of the data. Beyond that, though, using a quadratic fit for three points or a cubic fit for four is not validly giving more information, as it assumes both local and asymptotic behavior information with the addition of one point. You need justification for your hypothesis function. That justification can come from more points or from theory.
(A third reason that sometimes comes up is
You have theoretical and experimental reason to believe that error and noise do not contribute more than some bounds, and you can take a polynomial hypothesis to look at local derivatives and the behavior needed to match the data.
This is typically used in understanding data to build theoretical models without having a good starting point for theory. You should still strive to use the smallest polynomial degree possible, and look to substitute out patterns in the coefficients with what they may indicate (reciprocal, exponential, gaussian, etc.) in infinite series.)
Try imagining it this way. You have a function from which you pick n different values to represent a sample / training set:
y(n) = x(n), n is element of [0, 1]
But, since you want to build a robust model, you want to add a little noise to your training set, so you actually add a little noise when generating the data:
data(n) = y(n) + noise(n) = x(n) + u(n)
where by u(n) I marked a uniform random noise with a mean 0 and standard deviation 1: U(0,1). Quite simply, it's a noise signal which is most probable to take an value 0, and less likely to take a value farther it is from 0.
And then you draw, let's say, 10 points to be your training set. If there was no noise, they would all be lying on a line y = x. Since there was noise, the lowest degree of polynomial function that can represent them is probably of 10-th order, a function like: y = a_10 * x^10 + a_9 * x^9 + ... + a_1 * x + a_0.
If you consider, by just using an estimation of the information from the training set, you would probably get a simpler function than the 10-th order polynomial function, and it would have been closer to the real function.
Consider further that your real function can have values outside the [0, 1] interval but for some reason the samples for the training set could only be collected from this interval. Now, a simple estimation would probably act significantly better outside the interval of the training set, while if we were to fit the training set perfectly, we would get an overfitted function that meandered with lots of ups and downs all over :)
Overfitting is termed as bad due to the bais it has to the true solution. The solution which is overfit is 100% fitting to the training data which is used but with any small data point addition the model will change drastically. This is called variance of the model. Hence the bais-variance tradeoff where we try to have a balance between both the factors so that, the model does not change drastically on small data changes but also reasonably properly predicts the output.

Accurate least-squares fit algorithm needed

I've experimented with the two ways of implementing a least-squares fit (LSF) algorithm shown here.
The first code is simply the textbook approach, as described by Wolfram's page on LSF. The second code re-arranges the equation to minimize machine errors. Both codes produce similar results for my data. I compared these results with Matlab's p=polyfit(x,y,1) function, using correlation coefficients to measure the "goodness" of fit and compare each of the 3 routines. I observed that while all 3 methods produced good results, at least for my data, Matlab's routine had the best fit (the other 2 routines had similar results to each other).
Matlab's p=polyfit(x,y,1) function uses a Vandermonde matrix, V (n x 2 matrix) and QR factorization to solve the least-squares problem. In Matlab code, it looks like:
V = [x1,1; x2,1; x3,1; ... xn,1] % this line is pseudo-code
[Q,R] = qr(V,0);
p = R\(Q'*y); % performs same as p = V\y
I'm not a mathematician, so I don't understand why it would be more accurate. Although the difference is slight, in my case I need to obtain the slope from the LSF and multiply it by a large number, so any improvement in accuracy shows up in my results.
For reasons I can't get into, I cannot use Matlab's routine in my work. So, I'm wondering if anyone has a more accurate equation-based approach recommendation I could use that is an improvement over the above two approaches, in terms of rounding errors/machine accuracy/etc.
Any comments appreciated! thanks in advance.
For a polynomial fitting, you can create a Vandermonde matrix and solve the linear system, as you already done.
Another solution is using methods like Gauss-Newton to fit the data (since the system is linear, one iteration should do fine). There are differences between the methods. One possibly reason is the Runge's phenomenon.

Resources