Sympy nsolve vs Mathematica NSolve for multivariate polynomial equations - wolfram-mathematica

An interesting feature regardin NSolve[] with Mathematica is that it seems to provide all the solutions it can find (and hopefully it is exhaustive). For instance, as stated in the examples:
NSolve[{x^2 + y^3 == 1, 2 x + 3 y == 4}, {x, y}]
would return an array of 3 solutions.
From what I could try, it seems to scale quite well even for, say, 20 multivariate polynomial equations with 20 variables as it can be seen in this notebook.
Alternatively, I am quite found of using Sympy which also features a kind of nsolve function.
But there is a catch: this function requires a starting point "x0" and it would possibly find only one solution - and still, provided you are lucky enough to have chosen a proper x0.
Some users suggested in the past to use a "multi-start method" where one would choose a grid of potential starting points and run nsolve multiple times. But this doesn't seem to fit with my problem: if the grid is of size d for one variable, then it would scale exponentially as 20^d starting points for my own problems of 20 variables. It doesn't seem to match with Mathematica which seems to run in a blink.
What is mathematica doing to achieve such a fast solving? Is it due to the nature of the equations? (Maybe some Groebner basis computations behind the scene)
Could it be done with Sympy?
Thank you for your help!

Related

Locally weighted logistic regression

I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!
As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.

Doing probabilistic calculations on a higher abstraction level

To the downvoters: this isn't a question about mathematics, it's a
question about the programming language Mathematica.
One of the prime characteristics of Mathematica is that it can deal with many things symbolically. But if you come to think about it, many of the symbolic features are actually only halfway symbolic.
Take vectors for instance. We can have a symbolic vector like {x,y,z}, do a matrix multiplication with a matrix full of symbols and end up with a symbolic result and so we might consider that symbolic vector algebra. But we all know that, right out of the box, Mathematica does not allow you to say that a symbol x is a vector and that given a matrix A, A . x is a vector too. That's a higher level of abstraction, one that Mathematica (currently) does not very well deal with.
Similarly, Mathematica knows how to find the 5th derivative of a function that's defined in terms of nothing than symbols, but it's not well geared towards finding the r th derivative (see the "How to find a function's rth derivative when r is symbolic in Mathematica?" question).
Furthermore, Mathematica has extensive Boolean algebra capabilities, some stone age old, but many recently obtained in version 7. In version 8 we got Probability and friends (such as Conditioned) which allows us to reason with probabilities of random variables with given distributions. It's a really magnificent addition which helps me a lot in familiarizing myself with this domain, and I enjoy working with it tremendously. However,...
I was discussing with a colleague certain rules of probabilistic logic like the familiar
i.e., the conditional probability of event/state/outcome C given event/state/outcome A is true.
Specifically, we were looking at this one:
and although I had spoken highly about Mathematica's Probability just before I realized that I wouldn't know how to solve this right away with Mathematica. Again, just as with abstract vectors and matrices, and symbolic derivatives, this seems to be an abstraction level too high. Or is it? My question is:
Could you find a way to find the truth or falsehood in the above and similar equations using a Mathematica program?
>> Mathematica does not allow you to say that a symbol x is a vector
Sure it does... Close enough anyway... that it's a collection of Reals. It's called assumptions or conditioning, depending on what you want to do.
Refine[Sqrt[x]*Sqrt[y]]
The above doesn't refine because it assumes X and Y can be any symbol, but if you narrow their scope, you get results:
Assuming[ x > 0 && y > 0, Refine[Sqrt[x]*Sqrt[y]]]
It would be very nice to have the ability to say: Element[x,Reals^2] (2-dimensional real vector), maybe in Mathematica 9. :-)
As for this problem:
>> Could you find a way to find the truth or falsehood in the above and similar equations using a Mathematica program?
Please refer to my answer (first one) on this question to see a symbolic approach to Bayes theorem:
https://stackoverflow.com/questions/8378336/how-do-you-work-out-conditional-probabilities-in-mathematica-is-it-possible
Just glanced at this and found an example from the documentation on Condition:
In[1]:= c = x^2 < 30; a = x > 1;
(Sorry for the formatting here...)
In[2]:= Probability[c \[Conditioned] a, x \[Distributed] PoissonDistribution[2]] ==
Probability[c && a, x \[Distributed] PoissonDistribution[2]] / Probability[a, x \[Distributed] PoissonDistribution[2]]
Which evaluates to True and corresponds to a less general version of the first example you gave.
I'll revisit this later tonight if I have time.

Performance problems: Solving an inequality with several assumptions in Mathematica

I need to prove an inequality (or find a counter example) given several assumptions (also inequalities). Unfortunately the inequality to prove is a quite long and complicated expression. There are about 15 variables and FullSimplify's output fills several A4 pages. For examples with less variables, FindInstance helps to find a counterexample or gives a result of {} if the inequality is true. I also tried to use Reduce in that way:
Reduce[
Implies[
assumtion1 && assumtion2,
inequality
],
Reals
]
For simple examples this outputs "True", if the inequality holds. But in my case, after several hours of running time Mathematica needed 5-6 GB of RAM (and swap) so I had to abort the process.
Is there anything that I could do with Mathematica to improve the performance?
You will find a very nice paper on Mma CAD algorithms here
The cylindrical algebraic decomposition (CAD), which Mma uses, scales with a double exponential behavior on the number of variables.
Newer methods are double exponential in the number of quantifier alternations.
I think you'll have no luck using only the Mma internal engine, but you may roll your own based in the symmetries of your problem (if any)

Advice on Optimization routine/Constraints to use

I am trying to do some numerics and having a difficult time determining the appropriate way to solve a problem and looking for some feedback.
So far I have done all my work in Mathematica, however, I believe that the time has come where I need more control over my algorithms.
I can't post Images yet so here is a link to them
Where H is the heaviside stepfunciton. C(k) is just the FT of C(r) and m=4. N in my case is 2000 so you can see the omega is mearly the sum of a large number of exponentials. rho is just the densitiy. C(r) as you can see because m=4 has for different a coefficients. IRISM is ultimately a function of those for a coefficients.
I have these three equations working correctly I think within Mathematica however I am trying to minimize IRISM and find the 4 a values. The problem I am having is that, for obvious reasons, there is a discontinuity when the log with in the integral is equal to zero. I cannot seem to find a way to modify the Mathematica Algorithm (they are blackbox is that the right term?) so as to check the trial a values. I was using Nelder-Meade and Differential Evolution and attempting different constraints. However, I seemed to only get either imaginary results, obviously from a negative Log, or if I constrained well enough to avoid obviously only local minimum as my results did not match the "correct" results. I tried a few times with minimization algorithms that used gradients however I did not have much luck.
I think my best way to move forward is to just write a minimization routine from scratch, or modify other code, so as I can check IRISM ahead of integration for discontinuity. I have read up some on penalty function, log-barrier etc. but being somewhat new to programming was hoping someone might be able to let me know what a good approach would be to start off with. I think more than anything there is almost too much information out there on optimization and I am finding it difficult to know where to begin.
Edit: Here is the raw input. If I need to post it in a different way please let me know.
OverHat[c][a1_, a2_, a3_, a4_, k_] := (a1*(4*Pi*(Sin[k] - k*Cos[k])))/k^3 +
(a2*(4*Pi*(k*Sin[k] + 2*Cos[k] - 2)))/k^4 +
(a3*(8*Pi*(2*k - 3*Sin[k] + k*Cos[k])))/k^5 +
(a4*(-(24*Pi*(k^2 + k*Sin[k] + 4*Cos[k] - 4))))/k^6
Subscript[OverHat[\[Omega]], \[Alpha]\[Gamma]][k_, \[Alpha]\[Gamma]_, n_] :=
Exp[(-k^2)*\[Alpha]\[Gamma]*((n - \[Alpha]\[Gamma])/(6*n))]
OverHat[\[Omega]][k_] := Sum[Subscript[OverHat[\[Omega]], \[Alpha]\[Gamma]][k, \[Alpha]\[Gamma], n],
{\[Alpha]\[Gamma], 1, n}] /. n -> 2000
IRISM[a1_, a2_, a3_, a4_, \[Rho]_, kmax_] :=
\[Rho]^2*(1/15)*(20*a1 - 5*a2 + 2*a3 - a4)*Pi -
(1/(8*Pi^3))*NIntegrate[(\[Rho]*OverHat[\[Omega]][k]*OverHat[c][a1, a2, a3, a4, k] +
Log[1 - \[Rho]*OverHat[\[Omega]][k]*OverHat[c][a1, a2, a3, a4, k]])*4*Pi*k^2,
{k, 0, kmax}, WorkingPrecision -> 80]
NMinimize[IRISM[a1, a2, a3, a4, 0.9, 30], {a1, a2, a3, a4},
Method -> "DifferentialEvolution"]
Mathematica's FindMinimum aborts if it sees an imaginary number. This can happen even if your objective is real-valued inside the constraints because for default Barrier method because it poor accuracy control and can occasionally step out of bounds. Simplest way around it is to wrap your objective inside Re. You may get better answers if you post complete code.
Some general advice:
It's easier to try to simplify your objective for Mathematica than re-implement optimization algorithms. The reason is that one algorithm failing often means it's a difficult problem, and other algorithms will fail as well.
I once had a problem where FindMinimum gave warnings and failed to converge to correct minimum, which I could determine analytically, it happened with different methods, and it made sense when I plotted the objective surface, below
(source: yaroslavvb.com)
In this case, you can see the problem is very badly conditioned at the minimum (it's almost a plateau) and minimum is hard to localize.
When you have inequality constraints, default method is Barrier method, which is expensive and offers poor precision control. Very inefficient thing to do is to specify equality constraints as pairs of inequalities, ie instead of a=b, have a>=b and a<=b. This can be 3-10 times slower, and also numerically worse -- a and b might be only approximately equal in the result.
Ideally the goal is to get a problem which is convex, doesn't have any inequality constraints and is well conditioned.

Efficient evaluation of hypergeometric functions

Does anyone have experience with algorithms for evaluating hypergeometric functions? I would be interested in general references, but I'll describe my particular problem in case someone has dealt with it.
My specific problem is evaluating a function of the form 3F2(a, b, 1; c, d; 1) where a, b, c, and d are all positive reals and c+d > a+b+1. There are many special cases that have a closed-form formula, but as far as I know there are no such formulas in general. The power series centered at zero converges at 1, but very slowly; the ratio of consecutive coefficients goes to 1 in the limit. Maybe something like Aitken acceleration would help?
I tested Aitken acceleration and it does not seem to help for this problem (nor does Richardson extrapolation). This probably means Pade approximation doesn't work either. I might have done something wrong though, so by all means try it for yourself.
I can think of two approaches.
One is to evaluate the series at some point such as z = 0.5 where convergence is rapid to get an initial value and then step forward to z = 1 by plugging the hypergeometric differential equation into an ODE solver. I don't know how well this works in practice; it might not, due to z = 1 being a singularity (if I recall correctly).
The second is to use the definition of 3F2 in terms of the Meijer G-function. The contour integral defining the Meijer G-function can be evaluated numerically by applying Gaussian or doubly-exponential quadrature to segments of the contour. This is not terribly efficient, but it should work, and it should scale to relatively high precision.
Is it correct that you want to sum a series where you know the ratio of successive terms and it is a rational function?
I think Gosper's algorithm and the rest of the tools for proving hypergeometric identities (and finding them) do exactly this, right? (See Wilf and Zielberger's A=B book online.)

Resources