How to find the best (most informative) plotting limits? - algorithm

I am developing a 2D plotting program for functions of 1 variable. It is designed to be very simple so that the user should not have to select the initial plot limits (or "range").
Are there known algorithms that can find the most interesting plot limits, knowing only the function f(x) ?
Notes:
The definition of interesting plot limits is not well defined here. This is part of the question: what is the most interesting part of the plot?
I already have an algorithm to determine the range of x values where the function f has finite values.
I am using Javascript, but any language is ok.
I don't want to use existing libraries.
The function f is restricted to expressions that the user can write with the basic math operators + - * / ^ and functions exp log abs sqrt sin cos tan acos asin atan ceil floor.
Using the Google graphs, you can get some examples of automatic limits. Typing graph sin(x) works pretty well, but graph exp(x) and graph log(x) don't really give the best results. Also, graph sin(x*100)*exp(-x^2) does not choose the limits I would qualify as the most informative. But it would be good enough for me.
UPDATE:
I found that PlotRange in Mathematica does that automatically very well (see here). Is the source code available, or a reference explaining the algorithm? I could not find it anywhere.
UPDATE:
I started using an adaptative refinement algorithm to find informative plot ranges, inspired from this site. It is not working perfectly yet, but the current progress is implemented in my project here. You can try plotting a few functions and see how it works. When I have a fully working version I can post an answer.

I don't have a complete answer, but I might have some useful ideas to start with.
For me, interesting parts of a graph include:
All the roots of the function, except where there is an infinite number of roots (where we might be interested in no more than 8 of each).
All the roots of the first and second derivatives of the function, again except where there is an infinite number of roots.
The behaviour of the function around x = 0.
Locations of asymptotes, though I wouldn't want the graph to plot all the way to infinity.
To see the features of the graph, I'd like it to take up a "reasonable" amount of the rectangular graphing window. I think that might be achieved by aiming to have the integral of the absolute value of the function over the plotted range equal be in the range of, say, 20-80% of the graphing window.
Thus, a sketch of an heuristic for setting plot limits might be something like:
Find the range that includes all the roots of the function, its first and second derivatives, or (for functions with infinite numbers of roots) the (say) 8 roots closest to x=0.
If the range does not already include x=0, expand the range to include x=0.
Expand the x range by, say, 10% in each direction to ensure that all "interesting" points are well within the window.
Set the y range so that the integral of the absolute value of the function is (say) 30% of the area of the graphing window.

Related

Obtaining the functional form of a curve

The following is the plot of a curve f(r), where r is the radial coordinate, and plotted for different values of a parameter as shown:
However, I don't know the functional form of the curve and I am interested to find the same. Are there any numerical methods which can be used to find the functional form of f(r) in terms of the radial coordinate and the parameter?
I had found a solution of the problem based on the suggestion by ja72 to use the Eureqa software which churns through the data to create accurate predictive models using evolutionary search algorithm.
In the question, the different curves corresponds to different values of . So, initially I obtained the best fit equation for different values of and found that the following model equation is suitable for my purpose:
Then, I repeated the process for a large number of values of and calculated the values of the four functions for different values of and then individually fitted these four functions. The following are the results that I obtained:
N.B.: Eureqa gave several other better fitting formulas than those mentioned in the answer. But the formulas that I mentioned are sufficiently accurate for my purpose and have minimum complexity.
A blind curve fit without an underlying model is a dangerous thing.
You need to have an understanding of the physical model behind the data to create a successful fit. The reason is that if r is distance and the best fit curve uses r^0.4072 for example, that dimension raised to a decimal power bears no meaning and it hides any underlying assumptions.Like some other dimension l not included in the model, whereas only the dimensionless quantity (r/l) would make sense to raise to the decimal power.
From a function analysis standpoint
These curves are not the result of any standard math function. Well I am not that familiar with bessel functions, gamma functions and legendre polynomials. But none of the standard functions you find in a scientific calculator jumps out here.
If r is assumed to be dimensionless, then you try to match the asymptotic behavior when r -> 0 and when r -> ∞. The would be the baseline curve. To me it does not look hyperbolic, but rather close to 1/LN(1+r).
So change the variables make g=1/LN(1+r) and plot f(r) against g(r) and see what that looks like. Then try another round of curve fitting in the new curves ... and so on.
Nobody can answer this question
Nobody else could effectively answer this question but you, because a) you have the data, and b) you need to make assumptions about what region is important or not, and what is acceptable deviation.

Theory on how to find the equation of a curve given a variable number of data points

I have recently started working on a project. One of the problems I ran into was converting changing accelerations into velocity. Accelerations at different points in time are provided through sensors. If you get the equation of these data points, the derivative of a certain time (x) on that equation will be the velocity.
I know how to do this on the computer, but how would I get the equation to start with? I have searched around but I have not found any existing programs that can form an equation given a set of points. In the past, I have created a neural net algorithm to form an equation, but it takes an incredibly long time to run.
If someone can link me a program or explain the process of doing this, that would be fantastic.
Sorry if this is in the wrong forum. I would post into math, but a programming background will be needed to know the realm of possibility of what a computer can do quickly.
This started out as a comment but ended up being too big.
Just to make sure you're familiar with the terminology...
Differentiation takes a function f(t) and spits out a new function f'(t) that tells you how f(t) changes with time (i.e. f'(t) gives the slope of f(t) at time t). This takes you from displacement to velocity or from velocity to acceleration.
Integreation takes a function f(t) and spits out a new function F(t) which measures the area under the function f(t) from the beginning of time up until a given point t. What's not obvious at first is that integration is actually the reverse of differentiation, a fact called the The Fundamental Theorem of Calculus. So integration takes you from acceleration to velocity or velocity to displacement.
You don't need to understand the rules of calculus to do numerical integration. The simplest (and most naive) method for integrating a function numerically is just by approximating the area by dividing it up into small slices between time points and summing the area of rectangles. This approximating sum is called a Reimann sum.
As you can see, this tends to really overshoot and undershoot certain parts of the function. A more accurate but still very simple method is the trapezoid rule, which also approximates the function with a series of slices, except the tops of the slices are straight lines between the function values rather than constant values.
Still more complicated, but yet a better approximation, is Simpson's rules, which approximates the function with parabolas between time points.
(source: tutorvista.com)
You can think of each of these methods as getting a better approximation of the integral because they each use more information about the function. The first method uses just one data point per area (a constant flat line), the second method uses two data points per area (a straight line), and the third method uses three data points per area (a parabola).
You could read up on the math behind these methods here or in the first page of this pdf.
I agree with the comments that numerical integration is probably what you want. In case you still want a function going through your data, let me further argue against doing that.
It's usually a bad idea to find a curve that goes exactly through some given points. In almost any applied math context you have to accept that there is a little noise in the inputs, and a curve going exactly through the points may be very sensitive to noise. This can produce garbage outputs. Finding a curve going exactly through a set of points is asking for overfitting to get a function that memorizes rather than understands the data, and does not generalize.
For example, take the points (0,0), (1,1), (2,4), (3,9), (4,16), (5,25), (6,36). These are seven points on y=x^2, which is fine. The value of x^2 at x=-1 is 1. Now what happens if you replace (3,9) with (2.9,9.1)? There is a sixth order polynomial passing through all 7 points,
4.66329x - 8.87063x^2 + 7.2281x^3 - 2.35108x^4 + 0.349747x^5 - 0.0194304x^6.
The value of this at x=-1 is -23.4823, very far from 1. While the curve looks ok between 0 and 2, in other examples you can see large oscillations between the data points.
Once you accept that you want an approximation, not a curve going exactly through the points, you have what is known as a regression problem. There are many types of regression. Typically, you choose a set of functions and a way to measure how well a function approximates the data. If you use a simple set of functions like lines (linear regression), you just find the best fit. If you use a more complicated family of functions, you should use regularization to penalize overly complicated functions such as high degree polynomials with large coefficients that memorize the data. If you either use a simple family or regularization, the function tends not to change much when you add or withhold a few data points, which indicates that it is a meaningful trend in the data.
Unfortunately, integrating accelerometer data to get velocity is a numerically unstable problem. For most applications, your error will diverge far too soon to get results of any practical value.
Recall that:
So:
However well you fit a function to your accelerometer data, you will still essentially be doing a piecewise interpolation of the underlying acceleration function:
Where the error terms from each integration will add!
Typically you will see wildly inaccurate results after just a few seconds.

Is it possible to calculate the mathematical function of a 2D image?

The question basically says it all. I would like to add that lets suppose I have an image, a photograph and I wish to calculate its mathematical function, so that when I input x and y pixel value, it returns a vector consisting of R,G,B values at that x,y point. Therefore I can use a for loop to construct the whole image by just that function. I am not asking about the whole solution or algorithm here, but just that if this thing is possible, which direction should I take to go about doing this. Reference to relevant papers would be really nice.
Thanks
Azmuh
Yes, it is absolutely always possible. Basically, if you choose some points, there is always (an infinity of) smooth explicit functions (that is nice functions) which value on the points is exactly the one you choose.
For example, you can have a look at http://en.wikipedia.org/wiki/Lagrange_polynomial or http://en.wikipedia.org/wiki/Trigonometric_interpolation. They are two different methods to compute an explicit function which pass exactly by the data points you have. So you can apply those methods for your image, seen as a set of data points, and separately for R, G, and B.
At the end, you get one simple function explicitly (a polynomial or a trigonometric series, depending on what you chose), and you can compute its values where you want.
However, note that I would definitely not recommend to use those methods to effectively retrieve the data. Indeed, the functions you get are absolutely not optimized (that is with a veeeery high degree (for a n×m image, each color will have a degree nm-1), very high coefficients) and furthermore will have extremely large values between your original points (look for Runge's phenomenon).
This is not possible in general... Imagine an image that has been generated by random values for each pixel. You can't find a mathematical expression that will give you the value of a pixel given its 2d coordinates.
Now it may be possible for some images that have been generated using a function. In that case, it's not a problem specific to image processing, it's get back the function from some points of the function (in your case, you have all the points). It's exactly the same thing as extrapolating a curve from a set of points when you trace a graph in excel. The more points you have, the more precise the function you wind will be.
Look for information about Regression analysis. I can't help you much but there are some algorithms that exist.

Locally weighted logistic regression

I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!
As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.

Trilateration of a signal using Time Difference of Arrival

I am having some trouble to find or implement an algorithm to find a signal source. The objective of my work is to find the sound emitter position.
To accomplish this I am using three microfones. The technique that I am using is multilateration that is based on the time difference of arrival.
The time difference of arrival between each microfones are found using Cross Correlation of the received signals.
I already implemented the algorithm to find the time difference of arrival, but my problem is more on how multilateration works, it's unclear for me based on my reference, and I couldn't find any other good reference for this that are free/open.
If you have some references on how I can implement a multilateration algorithm, or some other trilateration algorithm that I can use based on time difference of arrival it would be a great help.
Thanks in advance.
The point you are looking for is the intersection of three hyperbolas. I am assuming 2D here since you only use 3 receptors. Technically, you can find a unique 3D solution but as you likely have noise, I assume that if you wanted a 3D result, you would have taken 4 microphones (or more).
The wikipedia page makes some computations for you. They do it in 3D, you just have to set z = 0 and solve for system of equations (7).
The system is overdetermined, so you will want to solve it in the least squares sense (this is the point in using 3 receptors actually).
I can help you with multi-lateration in general.
Basically, if you want a solution in 3d - you have to have at least 4 points and 4 distances from them (2-give you the circle in which is the solution - because that is the intersection between 2 spheres, 3 points give you 2 possible solutions (intersection between 3 spheres) - so, in order to have one solution - you need 4 spheres). So, when you have some points (4+) and the distance between them (there is an easy way to transform the TDOA into the set of equations for just having the length type distances /not time/) you need a way to solve the set of equations. First - you need a cost function (or solution error function, as I call it) which would be something like
err(x,y,z) = sum(i=1..n){sqrt[(x-xi)^2 + (y-yi)^2 + (z-zi)^2] - di}
where x, y, z are coordinates of the current point in the numerical solution and xi, yi, zi and di are the coordinates and distance towards the ith reference point. In order to solve this - my advice is NOT to use Newton/Gauss or Newton methods. You need first and second derivative of the aforementioned function - and those have a finite discontinuation in some points in space - hence that is not a smooth function and these methods won't work. What will work is direct search family of algorithms for optimization of functions (finding minimums and maximums. in our case - you need minimum of the error/cost function).
That should help anyone wanting to find a solution for similar problem.

Resources