Doing probabilistic calculations on a higher abstraction level - wolfram-mathematica

To the downvoters: this isn't a question about mathematics, it's a
question about the programming language Mathematica.
One of the prime characteristics of Mathematica is that it can deal with many things symbolically. But if you come to think about it, many of the symbolic features are actually only halfway symbolic.
Take vectors for instance. We can have a symbolic vector like {x,y,z}, do a matrix multiplication with a matrix full of symbols and end up with a symbolic result and so we might consider that symbolic vector algebra. But we all know that, right out of the box, Mathematica does not allow you to say that a symbol x is a vector and that given a matrix A, A . x is a vector too. That's a higher level of abstraction, one that Mathematica (currently) does not very well deal with.
Similarly, Mathematica knows how to find the 5th derivative of a function that's defined in terms of nothing than symbols, but it's not well geared towards finding the r th derivative (see the "How to find a function's rth derivative when r is symbolic in Mathematica?" question).
Furthermore, Mathematica has extensive Boolean algebra capabilities, some stone age old, but many recently obtained in version 7. In version 8 we got Probability and friends (such as Conditioned) which allows us to reason with probabilities of random variables with given distributions. It's a really magnificent addition which helps me a lot in familiarizing myself with this domain, and I enjoy working with it tremendously. However,...
I was discussing with a colleague certain rules of probabilistic logic like the familiar
i.e., the conditional probability of event/state/outcome C given event/state/outcome A is true.
Specifically, we were looking at this one:
and although I had spoken highly about Mathematica's Probability just before I realized that I wouldn't know how to solve this right away with Mathematica. Again, just as with abstract vectors and matrices, and symbolic derivatives, this seems to be an abstraction level too high. Or is it? My question is:
Could you find a way to find the truth or falsehood in the above and similar equations using a Mathematica program?

>> Mathematica does not allow you to say that a symbol x is a vector
Sure it does... Close enough anyway... that it's a collection of Reals. It's called assumptions or conditioning, depending on what you want to do.
Refine[Sqrt[x]*Sqrt[y]]
The above doesn't refine because it assumes X and Y can be any symbol, but if you narrow their scope, you get results:
Assuming[ x > 0 && y > 0, Refine[Sqrt[x]*Sqrt[y]]]
It would be very nice to have the ability to say: Element[x,Reals^2] (2-dimensional real vector), maybe in Mathematica 9. :-)
As for this problem:
>> Could you find a way to find the truth or falsehood in the above and similar equations using a Mathematica program?
Please refer to my answer (first one) on this question to see a symbolic approach to Bayes theorem:
https://stackoverflow.com/questions/8378336/how-do-you-work-out-conditional-probabilities-in-mathematica-is-it-possible

Just glanced at this and found an example from the documentation on Condition:
In[1]:= c = x^2 < 30; a = x > 1;
(Sorry for the formatting here...)
In[2]:= Probability[c \[Conditioned] a, x \[Distributed] PoissonDistribution[2]] ==
Probability[c && a, x \[Distributed] PoissonDistribution[2]] / Probability[a, x \[Distributed] PoissonDistribution[2]]
Which evaluates to True and corresponds to a less general version of the first example you gave.
I'll revisit this later tonight if I have time.

Related

Sympy nsolve vs Mathematica NSolve for multivariate polynomial equations

An interesting feature regardin NSolve[] with Mathematica is that it seems to provide all the solutions it can find (and hopefully it is exhaustive). For instance, as stated in the examples:
NSolve[{x^2 + y^3 == 1, 2 x + 3 y == 4}, {x, y}]
would return an array of 3 solutions.
From what I could try, it seems to scale quite well even for, say, 20 multivariate polynomial equations with 20 variables as it can be seen in this notebook.
Alternatively, I am quite found of using Sympy which also features a kind of nsolve function.
But there is a catch: this function requires a starting point "x0" and it would possibly find only one solution - and still, provided you are lucky enough to have chosen a proper x0.
Some users suggested in the past to use a "multi-start method" where one would choose a grid of potential starting points and run nsolve multiple times. But this doesn't seem to fit with my problem: if the grid is of size d for one variable, then it would scale exponentially as 20^d starting points for my own problems of 20 variables. It doesn't seem to match with Mathematica which seems to run in a blink.
What is mathematica doing to achieve such a fast solving? Is it due to the nature of the equations? (Maybe some Groebner basis computations behind the scene)
Could it be done with Sympy?
Thank you for your help!

How to find the best (most informative) plotting limits?

I am developing a 2D plotting program for functions of 1 variable. It is designed to be very simple so that the user should not have to select the initial plot limits (or "range").
Are there known algorithms that can find the most interesting plot limits, knowing only the function f(x) ?
Notes:
The definition of interesting plot limits is not well defined here. This is part of the question: what is the most interesting part of the plot?
I already have an algorithm to determine the range of x values where the function f has finite values.
I am using Javascript, but any language is ok.
I don't want to use existing libraries.
The function f is restricted to expressions that the user can write with the basic math operators + - * / ^ and functions exp log abs sqrt sin cos tan acos asin atan ceil floor.
Using the Google graphs, you can get some examples of automatic limits. Typing graph sin(x) works pretty well, but graph exp(x) and graph log(x) don't really give the best results. Also, graph sin(x*100)*exp(-x^2) does not choose the limits I would qualify as the most informative. But it would be good enough for me.
UPDATE:
I found that PlotRange in Mathematica does that automatically very well (see here). Is the source code available, or a reference explaining the algorithm? I could not find it anywhere.
UPDATE:
I started using an adaptative refinement algorithm to find informative plot ranges, inspired from this site. It is not working perfectly yet, but the current progress is implemented in my project here. You can try plotting a few functions and see how it works. When I have a fully working version I can post an answer.
I don't have a complete answer, but I might have some useful ideas to start with.
For me, interesting parts of a graph include:
All the roots of the function, except where there is an infinite number of roots (where we might be interested in no more than 8 of each).
All the roots of the first and second derivatives of the function, again except where there is an infinite number of roots.
The behaviour of the function around x = 0.
Locations of asymptotes, though I wouldn't want the graph to plot all the way to infinity.
To see the features of the graph, I'd like it to take up a "reasonable" amount of the rectangular graphing window. I think that might be achieved by aiming to have the integral of the absolute value of the function over the plotted range equal be in the range of, say, 20-80% of the graphing window.
Thus, a sketch of an heuristic for setting plot limits might be something like:
Find the range that includes all the roots of the function, its first and second derivatives, or (for functions with infinite numbers of roots) the (say) 8 roots closest to x=0.
If the range does not already include x=0, expand the range to include x=0.
Expand the x range by, say, 10% in each direction to ensure that all "interesting" points are well within the window.
Set the y range so that the integral of the absolute value of the function is (say) 30% of the area of the graphing window.

Locally weighted logistic regression

I have been trying to implement a locally-weighted logistic regression algorithm in Ruby. As far as I know, no library currently exists for this algorithm, and there is very little information available, so it's been difficult.
My main resource has been the dissertation of Dr. Kan Deng, in which he described the algorithm in what I feel is pretty light detail. My work so far on the library is here.
I've run into trouble when trying to calculate B (beta). From what I understand, B is a (1+d x 1) vector that represents the local weighting for a particular point. After that, pi (the probability of a positive output) for that point is the sigmoid function based on the B for that point. To get B, use the Newton-Raphson algorithm recursively a certain number of times, probably no more than ten.
Equation 4-4 on page 66, the Newton-Raphson algorithm itself, doesn't make sense to me. Based on my understanding of what X and W are, (x.transpose * w * x).inverse * x.transpose * w should be a (1+d x N) matrix, which doesn't match up with B, which is (1+d x 1). The only way that would work, then, is if e were a (N x 1) vector.
At the top of page 67, under the picture, though, Dr. Deng just says that e is a ratio, which doesn't make sense to me. Is e Euler's Constant, and it just so happens that that ratio is always 2.718:1, or is it something else? Either way, the explanation doesn't seem to suggest, to me, that it's a vector, which leaves me confused.
The use of pi' is also confusing to me. Equation 4-5, the derivative of the sigmoid function w.r.t. B, gives a constant multiplied by a vector, or a vector. From my understanding, though, pi' is just supposed to be a number, to be multiplied by w and form the diagonal of the weight algorithm W.
So, my two main questions here are, what is e on page 67 and is that the 1xN matrix I need, and how does pi' in equation 4-5 end up a number?
I realize that this is a difficult question to answer, so if there is a good answer then I will come back in a few days and give it a fifty point bounty. I would send an e-mail to Dr. Deng, but I haven't been able to find out what happened to him after 1997.
If anyone has any experience with this algorithm or knows of any other resources, any help would be much appreciated!
As far as I can see, this is just a version of Logistic regression in which the terms in the log-likelihood function have a multiplicative weight depending on their distance from the point you are trying to classify. I would start by getting familiar with an explanation of logistic regression, such as http://czep.net/stat/mlelr.pdf. The "e" you mention seems to be totally unconnected with Euler's constant - I think he is using e for error.
If you can call Java from Ruby, you may be able to make use of the logistic classifier in Weka described at http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/Logistic.html - this says "Although original Logistic Regression does not deal with instance weights, we modify the algorithm a little bit to handle the instance weights." If nothing else, you could download it and look at its source code. If you do this, note that it is a fairly sophisticated approach - for instance, they check beforehand to see if all the points actually lie pretty much in some subspace of the input space, and project down a few dimensions if they do.

Can program be used to simplify algebraic expressions?

We know 1+2+...+n is equal to n(n+1)/2.
But can we get the same result programatically if we don't know it in advance?
About why I have such a question.
Think of a more complex situation:
X1+X2+...+Xk=n, where Xi is integer and >= 0.
What's the Expectation of X1^2+...Xk^2?
The result is not obvious just by a glance, and we'll want to feed it to a program to reduce the algebra once we've worked out the (verbose)mathematical representation of Expectation of X1^2+...Xk^2
Perhaps you're thinking of a Computer algebra system (CAS)? WolframAlpha is a free online one that uses Mathematica (a very powerful CAS system) on it's backend. Here you can see it compute/simplify your expression: WolframAlpha.
Your example is just the sum of squares which has a pretty simple explicit formula: n(n+1)(2n+1)/6. More generally, you can use Faulhaber's formula to calculate Sum of n^p.
Okay, first some suggestions about the math part of the question and then some about the software development.
There's an e-book "A=B" by Marko Petkov·sek, Herbert S. Wilf and Doron Zeilberger which deals with solving (or showing there is no solution of) summation problems even more difficult than just polynomials. A review of the book by Ian Wanless is worth a quick reading. The e-book is freely downloadable, but bound copies can be purchased, e.g. from Amazon.
A 2004 Trans. of AMS paper Closed Form Summation of C-finite Sequences by Greene and Wilf is also available online.
In general you will need some basic CAS software to implement these algorithms, and it sounds like the goal is to develop such software yourself. I would recommend studying some of the open source CAS (computer algebra software) packages like Maxima or Axiom to get a feel for the scope of what is involved. Of course it's likely that a narrowly targeted application can do with only a fraction of what these fairly mature and high-end packages implement, but I don't feel that I can point you down a more directed path given the current phrasing of the question.
If "Expectation" of expressions is included in the scope of your project, there are a number of complications piled on top of mere algebraically manipulation. One certainly needs to be able to specify probability density functions to support expected values, and presumably some integration software (though potentially limiting the choice of parameterized distributions could lead to a simplified problem of looking up moments of those distributions). I do think this is a particularly nice application to jump into, as seemingly simple expressions (sums, max/min) of random variables can lead to nightmarish consideration of cases, well-suited to the patience of a computer.
EDIT, due to your recent clarification of the post.
You won't get away with a hand made solution, unless you have a whole team of PhDs and several years to spend. The best advice I can give you is to buy a Mathematica (or other) license and interface it with your program.
If you are a Lisp programmer, using Maxima is another potential (free this one) solution.
If you want background on the state of art in summation algorithms, this paper is a good start.
X1+X2+...+Xk=n, where Xi is integer and >= 0.
What's the Expectation of X1^2+...Xk^2?
This kind of problems occupy a lot of people to figure out how to do it on paper.
Let us take k = 2. Then X_1 + X_2 = n gives X_2 = n - X_1.
So the expectation to be computed is E = X_1^2 + (n - X_1)^2 = 2 X_1^2 -2n X_1 + n^2.
This reads
E = sum(p_k * (2 * k^2 - 2 * n * k + n^2), k = 0..infinity)
where p_k = Prob(X_1 = k). This kind of sums, depending on p_k, is generally very difficult to compute. I'd say that the problem is even more difficult than computing integrals in closed form (for which no software fully implement the available -- but undecidable -- Risch algorithm).
To convince yourself, take eg. p_k = 1 / (log(k) * k^4).
Finding a formula (or a formula generator) for it is at the very least a very difficult research problem.

Efficient evaluation of hypergeometric functions

Does anyone have experience with algorithms for evaluating hypergeometric functions? I would be interested in general references, but I'll describe my particular problem in case someone has dealt with it.
My specific problem is evaluating a function of the form 3F2(a, b, 1; c, d; 1) where a, b, c, and d are all positive reals and c+d > a+b+1. There are many special cases that have a closed-form formula, but as far as I know there are no such formulas in general. The power series centered at zero converges at 1, but very slowly; the ratio of consecutive coefficients goes to 1 in the limit. Maybe something like Aitken acceleration would help?
I tested Aitken acceleration and it does not seem to help for this problem (nor does Richardson extrapolation). This probably means Pade approximation doesn't work either. I might have done something wrong though, so by all means try it for yourself.
I can think of two approaches.
One is to evaluate the series at some point such as z = 0.5 where convergence is rapid to get an initial value and then step forward to z = 1 by plugging the hypergeometric differential equation into an ODE solver. I don't know how well this works in practice; it might not, due to z = 1 being a singularity (if I recall correctly).
The second is to use the definition of 3F2 in terms of the Meijer G-function. The contour integral defining the Meijer G-function can be evaluated numerically by applying Gaussian or doubly-exponential quadrature to segments of the contour. This is not terribly efficient, but it should work, and it should scale to relatively high precision.
Is it correct that you want to sum a series where you know the ratio of successive terms and it is a rational function?
I think Gosper's algorithm and the rest of the tools for proving hypergeometric identities (and finding them) do exactly this, right? (See Wilf and Zielberger's A=B book online.)

Resources