Calculation of Derivatives in pyGAM - curve-fitting

I would like to calculate the derivative of a fitted GAM model a at specific points in pyGAM. The model contains only spline terms, such that it should be effectively a BSpline.
Indeed, use of pygam.utils.b_spline_basis gives a matrix, which I can multiply by a.coefsto recover the pyGAM prediction from a.predict.
The idea was to use the analytic derivative of the BSpline to get the derivative. See here or simply Wikipedia's BSpline article for more information.
I came up with this:
def GetGAM(x,y,err):
if len(x)>5:
gam=pygam.LinearGAM(pygam.s(0, n_splines=len(x)),penalties=__PenaltyD3,lam=0,fit_intercept=False)
gam=gam.fit(x,y, weights=1/err)
#print('done')
return gam
x1 = np.linspace(0,50,10)
y1=x1**2
err=np.ones(10)
a = GetGAM(x1,y1,err)
x0=pygam.utils.b_spline_basis(x1,pygam.utils.gen_edge_knots(x1,'numerical'),spline_order=3,n_splines=len(a.coef_), periodic=False)
x=pygam.utils.b_spline_basis(x1,pygam.utils.gen_edge_knots(x1[:-1],'numerical'),spline_order=2,n_splines=len(a.coef_), periodic=False)
To recover y1: x0#a.coefs_works nicely.
To get the derivative I have
der2=x[:,1:]#(a.coef_[1:]-a.coef_[:-1])/((edges[1]-edges[0])/(len(a.coef_)))
I have also tried various variants of this, but could not get the solution (2*x1) to a satisfying degree.

I answered a related question, and suggested a numerical method for obtaining the derivative(s) of a pygam fit to data. Here is the link that may meet your requirements:-
Numerical derivative(s) of pygam spline.

Related

Wolfram alpha is able to integrate an indefinite integral but not a definite integral of the same function?

My question is regarding integration. I have a complex function that needs to be integrated and its a definite integral. The thing is when I use Wolfram Alpha to integrate this function it gives me nothing i.e its unable to compute it. However if I remove the boundaries of integration i.e I make my integral an indefinite integral, Wolfram Alpha is able to compute. Now my question is
Can I take the result I obtained for the indefinite integral and just evaluate for the boundary limits to evaluate my definite integral ?
If my analysis is correct, then why wouldn't Wolfram alpha give the result anyways?
using Wolfram Alpha, if I try
integrate(exp(-v)/(1+sv^-1))
then I get the following result
-e^(-v)-e^s s Ei(-s-v)
While if I try
integrate(exp(-v)/(1+sv^-1),{v,1,+infinity})
I get nothing!
since you tagged this Mathematica:
by specifying an appropriate assumption on s we get the expected result:
Integrate[Exp[-v]/(1 + s/v) , {v, 1, Infinity}, Assumptions -> {s > -1}]
--> 1/E + E^s s ExpIntegralEi[-1 - s]
I don't know if alpha has some similar syntax to add assumptions..
additionally if we try a finite integral:
Integrate[Exp[-v]/(1 + s/v) , {v, 1, 2} ]
mathematica returns a conditional expression that tells us the result is valid for s>-1 or s<-2. For some reason it doesn't give such result for the infinite case however.
Yes, you can take the result obtained for the indefinite integral and use to calculate the definite integral. When I try to run your request at Wolfram Alpha, here's what I get:
As you can see in the highlighted portion at the bottom left of the above picture, Wolfram Alpha didn't complete your request because it exceeded the standard computation time. This is because they need to offer some extra features for Wolfram Alpha Pro users to pay for the service. One of this features is extended computation time.
Wolfram Alpha is a business, and this is one of the ways it makes money. See for yourself, it'll offer you the pro service if you click the "Try again with additional computational time" on the bottom right.
If you just break down the definite integration between first the indefinite integral (which it can handle) and then calculate the boundary values and take the difference, it seems to work fine:
This is mathematically correct because that is how definite integrals are calculated.
However your input has an sv in the dividend. Wolfram Alpha is taking it to mean s*v, which might not be what you meant—if sv is a variable on it's own, I suggest you rename it to s or something else. The point is that if s is indeed a variable, if you take a look at the plot in the answer, there seems to be a ridge due to the -∞ term, so for some values of s that ridge might be within your integration curve, and then the integral can't be calculated, as Bill pointed out in his comment to your question.

How to find the maximum value of a function in Sympy?

These days I am trying to redo shock spectrum of single degree of freedom system using Sympy. The problem can reduce to find maximum value of a function. Following are two cases I cannot figure out how to do.
The first one is
tau,t,t_r,omega,p0=symbols('tau,t,t_r,omega,p0',positive=True)
h=expand(sin(omega*(t-tau)))
f=simplify(integrate(p0*tau/t_r*h,(tau,0,t_r))+integrate(p0*h,(tau,t_r,t)))
The final goal is to obtain maximum absolute value of f (The variable is t). The direct way is
df=diff(f,t)
sln=solve(simplify(df),t)
simplify(f.subs(t,sln[1]))
Here is the result, I tried many ways, but I can not simplify any further.
Therefore, I tried another way. Because I need the maximum absolute value and the location where abs(f) is maximum happens at the same location of square of f, we can calculate square of f first.
df=expand_trig(diff(expand(f)**2,t))
sln=solve(df,t)
simplify(f.subs(t,sln[2]))
It seems the answer is almost the same, just in another form.
The expected answer is a sinc function plus a constant as following:
Therefore, the question is how to get the final presentation.
The second one may be a little harder. The question can be reduced to find the maximum value of f=sin(pi*t/t_r)-T/2/t_r*sin(2*pi/T*t), in which t_r and T are two parameters. The maximum located at different peak when the ratio of t_r and T changes. And I do not find a way to solve it in Sympy. Any suggestion? The answer can be represented in following figure.
The problem is the log(exp(I*omega*t_r/2)) term. SymPy is not reducing this to I*omega*t_r/2. SymPy doesn't simplify this because in general, log(exp(x)) != x, but rather log(exp(x)) = x + 2*pi*I*n for some integer n. But in this case, if you replace log(exp(I*omega*t_r/2)) with omega*t_r/2 or omega*t_r/2 + 2*pi*I*n, it will be the same, because it will just add a 2*pi*I*n inside the sin.
I couldn't figure out any functions that force this simplification, but the easiest way is to just do a substitution:
In [18]: print(simplify(f.subs(t,sln[1]).subs(log(exp(I*omega*t_r/2)), I*omega*t_r/2)))
p0*(omega*t_r - 2*sin(omega*t_r/2))/(omega**2*t_r)
That looks like the answer you are looking for, except for the absolute value (I'm not sure where they should come from).

Finding a value of a variant in a permutation equation [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a math problem that I can't solve: I don't know how to find the value of n so that
365! / ((365-n)! * 365^n) = 50%.
I am using the Casio 500ms scientific calculator but I don't know how.
Sorry because my question is too easy, I am changing my career so I have to review and upgrade my math, the subject that I have neglected for years.
One COULD in theory use a root-finding scheme like Newton's method, IF you could take derivatives. But this function is defined only on the integers, since it uses factorials.
One way out is to recognize the identity
n! = gamma(n+1)
which will effectively allow you to extend the function onto the real line. The gamma function is defined on the positive real line, though it does have singularities at the negative integers. And of course, you still need the derivative of this expression, which can be done since gamma is differentiable.
By the way, a danger with methods like Newton's method on problems like this is it may still diverge into the negative real line. Choose poor starting values, and you may get garbage out. (I've not looked carefully at the shape of this function, so I won't claim for what set of starting values it will diverge on you.)
Is it worth jumping through the above set of hoops? Of course not. A better choice than Newton's method might be something like Brent's algorithm, or a secant method, which here will not require you to compute the derivative. But even that is a waste of effort.
Recognizing that this is indeed a problem on the integers, one could use a tool like bisection to resolve the solution extremely efficiently. It never requires derivatives, and it will work nicely enough on the integers. Once you have resolved the interval to be as short as possible, the algorithm will terminate, and take vary few function evaluations in the process.
Finally, be careful with this function, as it does involve some rather large factorials, which could easily overflow many tools to evaluate the factorial. For example, in MATLAB, if I did try to evaluate factorial(365):
factorial(365)
ans =
Inf
I get an overflow. I would need to move into a tool like the symbolic toolbox, or my own suite of variable precision integer tools. Alternatively, one could recognize that many of the terms in these factorials will cancel out, so that
365! / (365 - n)! = 365*(365-1)*(365-2)*...*(365-n+1)
The point is, we get an overflow for such a large value if we are not careful. If you have a tool that will not overflow, then use it, and use bisection as I suggested. Here, using the symbolic toolbox in MATLAB, I get a solution using only 7 function evaluations.
f = #(n) vpa(factorial(sym(365))/(factorial(sym(365 - n))*365^sym(n)));
f(0)
ans =
1.0
f(365)
ans =
1.4549552156187034033714015903853e-157
f(182)
ans =
0.00000000000000000000000095339164972764493041114884521295
f(91)
ans =
0.000004634800180846641815683109605743
f(45)
ans =
0.059024100534225072005461014516788
f(22)
ans =
0.52430469233744993108665513602619
f(23)
ans =
0.49270276567601459277458277166297
Or, if you can't take an option like that, but do have a tool that can evaluate the log of the gamma function, AND you have a rootfinder available as MATLAB does...
f = #(n) exp(gammaln(365+1) - gammaln(365-n + 1) - n*log(365));
fzero(#(n) f(n) - .5,10)
ans =
22.7677
As you can see here, I used the identity relating gamma and the factorial function, then used the log of the gamma function, in MATLAB, gammaln. Once all the dirty work was done, then I exponentiated the entire mess, which will be a reasonable number. Fzero tells us that the cross-over occurs between 22 and 23.
If a numerical approximation is ok, ask Wolfram Alpha:
n ~= -22.2298272...
n ~= 22.7676903...
I'm going to assume you have some special reason for wanting an actual algorithm, even though you only have one specific problem to solve.
You're looking for a value n where...
365! / ((365-n)! * 365^n) = 0.5
And therefore...
(365! / ((365-n)! * 365^n)) - 0.5 = 0.0
The general form of the problem is to find a value x such that f(x)=0. One classic algorithm for this kind of thing is the Newton-Raphson method.
[EDIT - as woodchips points out in the comment, the factorial is an integer-only function. My defence - for some problems (the birthday problem among them) it's common to generalise using approximation functions. I remember the Stirling approximation of factorials being used for the birthday problem - according to this, Knuth uses it. The Wikipedia page for the Birthday problem mentions several approximations that generalise to non-integer values.
It's certainly bad that I didn't think to mention this when I first wrote this answer.]
One problem with that is that you need the derivative of that function. That's more a mathematics issue, though you can estimate the derivative at any point by taking values a short distance either side.
You can also look at this as an optimisation problem. The general form of optimisation problems is to find a value x such that f(x) is maximised/minimised. In your case, you could define your function as...
f(x)=((365! / ((365-n)! * 365^n)) - 0.5)^2
Because of the squaring, the result can never be negative, so try to minimise. Whatever value of x gets you the smallest f(x) will also give you the result you want.
There isn't so much an algorithm for optimisation problems as a whole field - the method you use depends on the complexity of your function. However, this case should be simple so long as your language can cope with big numbers. Probably the simplest optimisation algorithm is called hill-climbing, though in this case it should probably be called rolling-down-the-hill. And as luck would have it, Newton-Raphson is a hill-climbing method (or very close to being one - there may be some small technicality that I don't remember).
[EDIT as mentioned above, this won't work if you need an integer solution for the problem as actually stated (rather than a real-valued approximation). Optimisation in the integer domain is one of those awkward issues that helps make optimisation a field in itself. The branch and bound is common for complex functions. However, in this case hill-climbing still works. In principle, you can even still use a tweaked version of Newton-Raphson - you just have to do some rounding and check that you don't keep rounding back to the same place you started if your moves are small.]

Computationally simple pseudo-Gaussian distribution with varying mean and standard deviation?

This picture from Wikipedia has a nice example of the sort of functions I'd ideally like to generate:
Right now I'm using the Irwin-Hall Distribution, which is more or less a polynomial approximation of the Gaussian distribution...basically, you use uniform random number generator and iterate it x times, and take the average. The more iterations, the more like a Gaussian Distribution it is.
It's pretty nice; however I'd like to be able to have one where I can vary the mean. For example, let's say I wanted a number between the range 0 and 10, but around 7. Like, the mean (if I repeated this function multiple times) would turn out to be 7, but the actual range is 0-10.
Is there one I should look up, or should I work on doing some fancy maths with standard Gaussian distributions?
I see a contradiction in your question. From one side you want normal distribution which is symmetrical by it's nature, from other side you want the range asymmetrically disposed to mean value.
I suspect you should try to look at other distributions density functions of which are like bell curve but asymmetrical. Like log distribution or beta distribution.
Look into generating normal random variates. You can generate pairs of normal random variates X = N(0,1) and tranform it into ANY normal random variate Y = N(m,s) (Y = m + s*X).
Sounds like the Truncated Normal distribution is just what the doctor ordered. It is not "computationally simple" per se, but easy to implement if you have an existing implementation of a normal distribution.
You can just generate the distribution with the mean you want, standard deviation you want, and the two ends wherever you want. You'll have to do some work beforehand to compute the mean and standard deviation of the underlying (non-truncated) normal distribution to get the mean for the TN that you want, but you can use the formulae in that article. Also note that you can adjust the variance as well using this method :)
I have Java code (based on the Commons Math framework) for both an accurate (slower) and quick (less accurate) implementation of this distribution, with PDF, CDF, and sampling.

Is there a way to predict unknown function value based on its previous values

I have values returned by unknown function like for example
# this is an easy case - parabolic function
# but in my case function is realy unknown as it is connected to process execution time
[0, 1, 4, 9]
is there a way to predict next value?
Not necessarily. Your "parabolic function" might be implemented like this:
def mindscrew
#nums ||= [0, 1, 4, 9, "cat", "dog", "cheese"]
#nums.pop
end
You can take a guess, but to predict with certainty is impossible.
You can try using neural networks approach. There are pretty many articles you can find by Google query "neural network function approximation". Many books are also available, e.g. this one.
If you just want data points
Extrapolation of data outside of known points can be estimated, but you need to accept the potential differences are much larger than with interpolation of data between known points. Strictly, both can be arbitrarily inaccurate, as the function could do anything crazy between the known points, even if it is a well-behaved continuous function. And if it isn't well-behaved, all bets are already off ;-p
There are a number of mathematical approaches to this (that have direct application to computer science) - anything from simple linear algebra to things like cubic splines; and everything in between.
If you want the function
Getting esoteric; another interesting model here is genetic programming; by evolving an expression over the known data points it is possible to find a suitably-close approximation. Sometimes it works; sometimes it doesn't. Not the language you were looking for, but Jason Bock shows some C# code that does this in .NET 3.5, here: Evolving LINQ Expressions.
I happen to have his code "to hand" (I've used it in some presentations); with something like a => a * a it will find it almost instantly, but it should (in theory) be able to find virtually any method - but without any defined maximum run length ;-p It is also possible to get into a dead end (evolutionary speaking) where you simply never recover...
Use the Wolfram Alpha API :)
Yes. Maybe.
If you have some input and output values, i.e. in your case [0,1,2,3] and [0,1,4,9], you could use response surfaces (basicly function fitting i believe) to 'guess' the actual function (in your case f(x)=x^2). If you let your guessing function be f(x)=c1*x+c2*x^2+c3 there are algorithms that will determine that c1=0, c2=1 and c3=0 given your input and output and given the resulting function you can predict the next value.
Note that most other answers to this question are valid as well. I am just assuming that you want to fit some function to data. In other words, I find your question quite vague, please try to pose your questions as complete as possible!
In general, no... unless you know it's a function of a particular form (e.g. polynomial of some degree N) and there is enough information to constrain the function.
e.g. for a more "ordinary" counterexample (see Chuck's answer) for why you can't necessarily assume n^2 w/o knowing it's a quadratic equation, you could have f(n) = n4 - 6n3 + 12n2 - 6n, which has for n=0,1,2,3,4,5 f(n) = 0,1,4,9,40,145.
If you do know it's a particular form, there are some options... if the form is a linear addition of basis functions (e.g. f(x) = a + bcos(x) + csqrt(x)) then using least-squares can get you the unknown coefficients for the best fit using those basis functions.
See also this question.
You can apply statistical methods to try and guess the next answer, but that might not work very well if the function is like this one (c):
int evil(void){
static int e = 0;
if(50 == e++){
e = e * 100;
}
return e;
}
This function will return nice simple increasing numbers then ... BAM.
That's a hard problem.
You should check out the recurrence relation equation for special cases where it could be possible such a task.

Resources