Newton module in stdlib - what does it do? - ruby

BigDecimal has some modules which are hardly documented, like Newton.
"Solves the nonlinear algebraic equation system f = 0 by Newton’s
method. This program is not dependent on BigDecimal.
To call:
n = nlsolve(f,x) where n is the number of iterations required,
x is the initial value vector
f is an Object which is used to compute the values of the equations to be solved. "
And that's it. Google did not result in something I could understand. I'd like to see some sample code with a bit of not-too-math-heavy explanation; to get a better idea of what that weird thing at the bottom of the toolbox is.

Newton's Method is a way of approximating the root of an equation. It's pretty good, provided your function meets some continuity requirements.
The method is:
Take a starting point
At that point, find a tangent line
Figure out where that tangent line has a root. Take the root as a point.
If you've reached tolerance, return this point as the solution. If not, go back to #1 using this as your new point.

Related

Display the exact form of quadratic equations' roots

I notice that almost all of new calculators are able to display the roots of quadratic equations in exact form. For example:
x^2-16x+14=0
x1=8+5sqrt2
x2=8-5sqrt2
What algorithm could I use to achieve that? I've been searching around but I found no results related to this problem
Assuming your quadratic equation is in the form
y = ax^2+bx+c
you get the two roots by
x_1,x_2 = ( -b +- sqrt(b^2-4ac)) / 2a
when for one you use the + between b and the square root, and for the other the -.
If you want to take something out of the square root, just compute the factors of the argument and take out the ones with exponent greater than 2.
By the way, the two root you posted are wrong.
The “algorithm” is exactly the same as on paper. Depending on the programming language, it may start with int delta = b*b - 4*a*c;.
You may want to define a datatype of terms and simplifications on them, though, in case the coefficients of the equation are not simply integer but themselves solutions of previous equations. If this is the sort of thing you are after, look up “symbolic computation”. Some languages are better for this purpose than others. I expect that elementary versions of what you are asking is actually used as an example in some ML tutorials, for instance (see chapter 9).

How to find the best (most informative) plotting limits?

I am developing a 2D plotting program for functions of 1 variable. It is designed to be very simple so that the user should not have to select the initial plot limits (or "range").
Are there known algorithms that can find the most interesting plot limits, knowing only the function f(x) ?
Notes:
The definition of interesting plot limits is not well defined here. This is part of the question: what is the most interesting part of the plot?
I already have an algorithm to determine the range of x values where the function f has finite values.
I am using Javascript, but any language is ok.
I don't want to use existing libraries.
The function f is restricted to expressions that the user can write with the basic math operators + - * / ^ and functions exp log abs sqrt sin cos tan acos asin atan ceil floor.
Using the Google graphs, you can get some examples of automatic limits. Typing graph sin(x) works pretty well, but graph exp(x) and graph log(x) don't really give the best results. Also, graph sin(x*100)*exp(-x^2) does not choose the limits I would qualify as the most informative. But it would be good enough for me.
UPDATE:
I found that PlotRange in Mathematica does that automatically very well (see here). Is the source code available, or a reference explaining the algorithm? I could not find it anywhere.
UPDATE:
I started using an adaptative refinement algorithm to find informative plot ranges, inspired from this site. It is not working perfectly yet, but the current progress is implemented in my project here. You can try plotting a few functions and see how it works. When I have a fully working version I can post an answer.
I don't have a complete answer, but I might have some useful ideas to start with.
For me, interesting parts of a graph include:
All the roots of the function, except where there is an infinite number of roots (where we might be interested in no more than 8 of each).
All the roots of the first and second derivatives of the function, again except where there is an infinite number of roots.
The behaviour of the function around x = 0.
Locations of asymptotes, though I wouldn't want the graph to plot all the way to infinity.
To see the features of the graph, I'd like it to take up a "reasonable" amount of the rectangular graphing window. I think that might be achieved by aiming to have the integral of the absolute value of the function over the plotted range equal be in the range of, say, 20-80% of the graphing window.
Thus, a sketch of an heuristic for setting plot limits might be something like:
Find the range that includes all the roots of the function, its first and second derivatives, or (for functions with infinite numbers of roots) the (say) 8 roots closest to x=0.
If the range does not already include x=0, expand the range to include x=0.
Expand the x range by, say, 10% in each direction to ensure that all "interesting" points are well within the window.
Set the y range so that the integral of the absolute value of the function is (say) 30% of the area of the graphing window.

Finding more than one root with root finding algorithms

What is the best way to find the roots of an equation with more than one root. I understand that no one method which can solve every equation, and that you have to use more than one, but I can't find a root finding algorithm that can solve for more than one root in even the simplest instance.
For example: y = x^2
Although a root solving algorithm to solve a basic equation like this is helpful, it would need to be something I could adapt to solve an equation with more than two roots.
One more thing to note is that the equations wouldn't be your typical polynomials, but could be something such as ln(x^2) + x - 15 = 0
What is a root finding algorithm that could solve for this, or how could you edit a root finding algorithm such as the Bisection/Newton/Brent method to solve this problem, (Assuming I'm correct in that Newton and Brent's method can only solve for one root).
I'd say that there's no general method to find all the roots of a general equation. However, one can try and devise methodologies once sufficient conditions have been specified. Even simple quadratic equations ax2 + bx + c = 0 aren't completely trivial, because the existence of real roots depends on the sign of b2-4ac, which isn't immediately obvious. So there are lots of techniques to apply, e.g Newton-Raphson, but no general method for the general case, especially for equations like ln(x2)+x-15 = 0.
Bottom line: You need to isolate the roots yourself.
Details depend on the algorithm:
If you're using bisection or Brent's method, you need to come up with a set of intervals each containing a unique root.
If using the Newton's method, you need to come up with a set of starting estimates (since it converges to a root given a starting point, and with different starting points it may or may not converge to different roots).
As everyone has said, it is impossible to provide a general algorithm to find all, or some of the roots (where some is greater than one.) And how many roots is some? You cannot find all of the roots in general, since many functions will have infinitely many roots.
Even methods like Newton do not always converge to a solution. I tend to like a good, fairly stable method that will converge to a solution under reasonable circumstances, such as a bracket where the function is known to change sign. You can find such a code that has good convergence behavior on single roots, yet still is protected to behave basically like a bisection scheme when the function is less well behaved.
So, given a decent root finding scheme, you can try simple things like deflation. Thus, consider a simple function, like a first kind Bessel function. I'll do all my examples using MATLAB, but any tool that has a stable well written rootfinder like fzero in MATLAB will suffice.
ezplot(#(x) besselj(0,x),[0,10])
grid on
f0 = #(x) besselj(0,x);
xroots(1) = fzero(f0,1)
xroots =
2.4048
From the plot, we can see there is a second root around 5 or 6.
Now, deflate f0 for that root, creating a new function based on f0, but one that lacks a root at xroots(1).
f1 = #(x) f0(x)./(x-xroots(1));
ezplot(f1,[0,10])
grid on
Note that in this curve, the root of f0 at xroots(1) has now been zapped away, as if it did not exist. Can we find a second root?
xroots(2) = fzero(f1,2)
xroots =
2.4048 5.5201
We can go on of course, but at some point this scheme will fail due to numerical issues. And that failure won't take too terribly long either.
A better scheme might be to (for 1-d problems) use a bracketing scheme, conjoined with a sampling methodology. I say better because it does not require modifying the initial function to deflate the roots. (For 2-d or higher, things get far more hairy of course.)
xlist = (0:1:10)';
flist = f0(xlist);
[xlist,flist]
ans =
0 1.0000
1.0000 0.7652
2.0000 0.2239
3.0000 -0.2601
4.0000 -0.3971
5.0000 -0.1776
6.0000 0.1506
7.0000 0.3001
8.0000 0.1717
9.0000 -0.0903
10.0000 -0.2459
As you can see, the function has sign changes in the intervals [2,3], [5,6], and [8,9]. A rootfinder that can search in a bracket will do here.
fzero(f0,[2,3])
ans =
2.4048
fzero(f0,[5,6])
ans =
5.5201
fzero(f0,[8,9])
ans =
8.6537
Just look for sign changes, then throw the known bracket into a root finder. This will give as many solutions as you can find brackets.
Be advised, there are serious problems with the above scheme. It will completely fail to find the double root of a simple function like f(x)=x^2, because no sign change exists. And if you choose too coarse of a sampling, then you may have an interval with TWO roots in it, but you won't see a sign change at the endpoints.
For example, consider the function f(x) = x^2-x, which has single roots at 0 and at 1. But if you sample that function at -1 and 2, you will find that it is positive at both points. There is no sign change, but there are two roots.
Again, NO method can be made perfect. You can ALWAYS devise a function that will cause any such numerical method to fail.

Locally weighted logistic regression

I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!
As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.

What's a good weighting function?

I'm trying to perform some calculations on a non-directed, cyclic, weighted graph, and I'm looking for a good function to calculate an aggregate weight.
Each edge has a distance value in the range [1,∞). The algorithm should give greater importance to lower distances (it should be monotonically decreasing), and it should assign the value 0 for the distance ∞.
My first instinct was simply 1/d, which meets both of those requirements. (Well, technically 1/∞ is undefined, but programmers tend to let that one slide more easily than do mathematicians.) The problem with 1/d is that the function cares a lot more about the difference between 1/1 and 1/2 than the difference between 1/34 and 1/35. I'd like to even that out a bit more. I could use √(1/d) or ∛(1/d) or even ∜(1/d), but I feel like I'm missing out on a whole class of possibilities. Any suggestions?
(I thought of ln(1/d), but that goes to -∞ as d goes to ∞, and I can't think of a good way to push that up to 0.)
Later:
I forgot a requirement: w(1) must be 1. (This doesn't invalidate the existing answers; a multiplicative constant is fine.)
perhaps:
exp(-d)
edit: something along the lines of
exp(k(1-d)), k real
will fit your extra requirement (I'm sure you knew that but what the hey).
How about 1/ln (d + k)?
Some of the above answers are versions of a Gaussian distribution which I agree is a good choice. The Gaussian or normal distribution can be found often in nature. It is a B-Spline basis function of order-infinity.
One drawback to using it as a blending function is its infinite support requires more calculations than a finite blending function. A blend is found as a summation of product series. In practice the summation may stop when the next term is less than a tolerance.
If possible form a static table to hold discrete Gaussian function values since calculating the values is computationally expensive. Interpolate table values if needed.
How about this?
w(d) = (1 + k)/(d + k) for some large k
d = 2 + k would be the place where w(d) = 1/2
It seems you are in effect looking for a linear decrease, something along the lines of infinity - d. Obviously this solution is garbage, but since you are probably not using a arbitrary precision data type for the distance, you could use yourDatatype.MaxValue - d to get a linear decreasing function for this.
In fact you might consider using (yourDatatype.MaxValue - d) + 1 you are using doubles, because you could then assign the weight of 0 if your distance is "infinity" (since doubles actually have a value for that.)
Of course you still have to consider implementation details like w(d) = double.infinity or w(d) = integer.MaxValue, but these should be easy to spot if you know the actual data types you are using ;)

Resources