Conditional expectation with sympy - probability

How can I calculate the conditional expectation of a random variable in sympy? I read this and tried:
from sympy.stats import *
v = Uniform("v",0,1)
E(v)
this returns correctly 1/2, but then:
E(v, v>1/2)
returns NaN. I also tried:
E(v, where(v > 1/2))
it returned 1/2, which is incorrect (it should be 3/4).
What am I doing wrong?

This issue (which I see you already reported) is specific to uniformly distributed random variables. (There's also an older issue involving Uniform.) For other distributions, what you did works correctly:
>>> from sympy.stats import *
>>> x = Exponential("x", 1)
>>> E(x, x < 2)
-3/(-1 + exp(2)) + exp(2)/(-1 + exp(2))
As for the uniform type, a workaround for now is to remember that conditioning a uniformly distributed random variable to some interval creates another uniformly distributed random variable.
So the value of E(v, v > 1/2) can be found by computing
E(Uniform("x", 1/2, 1))
which returns 0.75.
Caution: if working interactively, one may want to eventually import from core SymPy, in addition to its stats module. Since E stands for Euler's number 2.718... in SymPy, one may end up unable to compute expectations with
TypeError: 'Exp1' object is not callable
So one either has to be more specific about what to import, or use namespace for one or both modules. My preferred solution is
from sympy import *
import sympy.stats as st
So that st.E is expectation while E is 2.718...

Related

Curve fitting with nonlinear constraints with scipy.optimize.minimize

Let's say that we want to fit a nonlinear model to our data (that include measurement errors) with certain constraints on the fit parameters among themselves. Without these constraints, scipy.optimize.curve_fit does the job as it supports by default the optimization of the
chisq = sum((r / sigma) ** 2)
where r is the difference between the true and predicted value. Long story short, the user defines a function and pass it to the solver which minimizes the chisq loss. The situation gets a bit more complex when there are other restrictions imposed on the function parameters, e.g. for a linear model y=ax+b, we can demand a fixed nonlinear condition such as a^2+b=1. A function scipy.optimization.minimize is just what the doctor ordered, provided that we pass the chisq as argument. Below is the code for using minimize found here scipy.optimize with non linear constraints. I'm struggling very hard in building the chisq properly to make the code work. The most general question would how does one defines in the most pythonic way a custom loss that would be minimized?
Help please.
from math import cos, atan
import numpy as np
from scipy.optimize import minimize
def f(x):
return 0.1 * x[0] * x[1]
def ineq_constraint(x):
return x[0]**2 + x[1]**2 - (5. + 2.2 * cos(10 * atan(x[0] / x[1])))**2
con = {'type': 'ineq', 'fun': ineq_constraint}
x0 = [1, 1]
res = minimize(f, x0, method='SLSQP', constraints=con)
Since we minimize the chisq function that is found by optimizing our targeted function, I guess the chisq will be defined in a very elegant way in a form of a wrapper since for arguments it takes the targeted function, a column of true values and a column of uncertanties e.g.
chisq(target_function, (y, y_err)).

Computing a single element of the adjugate or inverse of a symbolic binary matrix

I'm trying to get a single element of an adjugate A_adj of a matrix A, both of which need to be symbolic expressions, where the symbols x_i are binary and the matrix A is symmetric and sparse. Python's sympy works great for small problems:
from sympy import zeros, symbols
size = 4
A = zeros(size,size)
x_i = [x for x in symbols(f'x0:{size}')]
for i in range(size-1):
A[i,i] += 0.5*x_i[i]
A[i+1,i+1] += 0.5*x_i[i]
A[i,i+1] = A[i+1,i] = -0.3*(i+1)*x_i[i]
A_adj_0 = A[1:,1:].det()
A_adj_0
This calculates the first element A_adj_0 of the cofactor matrix (which is the corresponding minor) and correctly gives me 0.125x_0x_1x_2 - 0.28x_2x_2^2 - 0.055x_1^2x_2 - 0.28x_1x_2^2, which is the expression I need, but there are two issues:
This is completely unfeasible for larger matrices (I need this for sizes of ~100).
The x_i are binary variables (i.e. either 0 or 1) and there seems to be no way for sympy to simplify expressions of binary variables, i.e. simplifying polynomials x_i^n = x_i.
The first issue can be partly addressed by instead solving a linear equation system Ay = b, where b is set to the first basis vector [1, 0, 0, 0], such that y is the first column of the inverse of A. The first entry of y is the first element of the inverse of A:
b = zeros(size,1)
b[0] = 1
y = A.LUsolve(b)
s = {x_i[i]: 1 for i in range(size)}
print(y[0].subs(s) * A.subs(s).det())
print(A_adj_0.subs(s))
The problem here is that the expression for the first element of y is extremely complicated, even after using simplify() and so on. It would be a very simple expression with simplification of binary expressions as mentioned in point 2 above. It's a faster method, but still unfeasible for larger matrices.
This boils down to my actual question:
Is there an efficient way to compute a single element of the adjugate of a sparse and symmetric symbolic matrix, where the symbols are binary values?
I'm open to using other software as well.
Addendum 1:
It seems simplifying binary expressions in sympy is possible with a simple custom substitution which I wasn't aware of:
A_subs = A_adj_0
for i in range(size):
A_subs = A_subs.subs(x_i[i]*x_i[i], x_i[i])
A_subs
You should make sure to use Rational rather than floats in sympy so S(1)/2 or Rational(1, 2) rather than 0.5.
There is a new (undocumented and for the moment internal) implementation of matrices in sympy called DomainMatrix. It is likely to be a lot faster for a problem like this and always produces polynomial results in a fully expanded form. I expect that it will be much faster for this kind of problem but it still seems to be fairly slow for this because is is not sparse internally (yet - that will probably change in the next release) and it does not take advantage of the simplification from the symbols being binary-valued. It can be made to work over GF(2) but not with symbols that are assumed to be in GF(2) which is something different.
In case it is helpful though this is how you would use it in sympy 1.7.1:
from sympy import zeros, symbols, Rational
from sympy.polys.domainmatrix import DomainMatrix
size = 10
A = zeros(size,size)
x_i = [x for x in symbols(f'x0:{size}')]
for i in range(size-1):
A[i,i] += Rational(1, 2)*x_i[i]
A[i+1,i+1] += Rational(1, 2)*x_i[i]
A[i,i+1] = A[i+1,i] = -Rational(3, 10)*(i+1)*x_i[i]
# Convert to DomainMatrix:
dM = DomainMatrix.from_list_sympy(size-1, size-1, A[1:, 1:].tolist())
# Compute determinant and convert back to normal sympy expression:
# Could also use dM.det().as_expr() although it might be slower
A_adj_0 = dM.charpoly()[-1].as_expr()
# Reduce powers:
A_adj_0 = A_adj_0.replace(lambda e: e.is_Pow, lambda e: e.args[0])
print(A_adj_0)

Python Gekko - How to use built-in maximum function with sequential solvers?

When solving models sequentially in Python GEKKO (i.e. with IMODE >= 4) fails when using the max2 and max3 functions that come with GEKKO.
This is for use cases, where np.maximum or the standard max function treat a GEKKO parameter like an array, which is not always the intended usage or can create errors when comparing against integers for example.
minimal code example:
from gekko import GEKKO
import numpy as np
m = GEKKO()
m.time = np.arange(0,20)
y = m.Var(value=5)
forcing = m.Param(value=np.arange(-5,15))
m.Equation(y.dt()== m.max2(forcing,0) * y)
m.options.IMODE=4
m.solve(disp=False)
returns:
Exception: #error: Degrees of Freedom
* Error: DOF must be zero for this mode
STOPPING...
I know from looking at the code that both max2 and max3 use inequality expressions in the equations, which understandably introduces the degrees of freedoms, so was this functionality never intended? Could there be some workaround to fix this?
Any help would be much appreciated!
Note:
I hope this is not a duplicate of How to define maximum of Intermediate and another value in Python Gekko, when using sequential solver?, but instead asking a more concise & different question, about essentially the same issue.
You can get a successful solution by switching to IMODE=6. IMODE=4 (simultaneous simulation) or IMODE=7 sequential simulation requires zero degrees of freedom. Both m.max2() and m.max3() require degrees of freedom and an optimizer to solve.
from gekko import GEKKO
import numpy as np
m = GEKKO(remote=False)
m.time = np.arange(0,20)
y = m.Var(value=5)
forcing = m.Param(value=np.arange(-5,15))
m.Equation(y.dt()== -m.max2(forcing,0) * y)
m.options.IMODE=6
m.solve(disp=True)
The equation y.dt()== -m.max2(forcing,0) * y exponentially increases beyond machine precision so I switched the equation to something that can solve.

Random value from two seeds

Have a twodimensional grid and need a reproducible, random value for every integer coordinate on this grid. This value should be as unique as possible. In a grid of, let's say 1000 x 1000 it shouldn't occur twice.
To put it more mathematical: I'd need a function f(x, y) which gives an unique number no matter what x and y are as long as they are each in the range [0, 1000]
f(x, y) has to be reproducible and not have side-effects.
Probably there is some trivial solution but everything that comes to my mind, like multiplying x and y, adding some salt, etc. does not lead anywhere because the resulting number can easily occur multiple times.
One working solution I got is to use a randomizer and simply compute ALL values in the grid, but that is too computationally heavy (to do every time a value is needed) or requires too much memory in my case (I want to avoid pre-computing all the values).
Any suggestions?
Huge thanks in advance.
I would use the zero-padded concatenation of your x and y as a seed for a built-in random generator. I'm actually using something like this in some of my current experiments.
I.e. x = 13, y = 42 would become int('0013' + '0042') = 130042 to use as random seed. Then you can use the random generator of your choice to get the kind (float, int, etc) and range of values you need:
Example in Python 3.6+:
import numpy as np
from itertools import product
X = np.zeros((1000, 1000))
for x, y in product(range(1000), range(1000)):
np.random.seed(int(f'{x:04}{y:04}'))
X[x, y] = np.random.random()
Each value in the grid is randomly generated, but independently reproducible.

Pyomo summation of a product of a matrix by a vector

I edit my code including all the parameters and variables involved:
(D is a numpy matrix imported from Python)
import pyomo
from pyomo.environ import *
from array import *
import numpy as np
import scipy as sp
from diff_matrix import D ##N=10????
print(D)
m =ConcreteModel()
...
m.n = Param(initialize = 10, within = Integers)
m.Ns = Set(initialize = range(0,value(m.n)))
m.x1 = Var(m.N, domain = Reals)
m.D = Param(m.N, m.N, initialize=D)
m.f_x1 = Var(m.N)
def f_x1_definition(model,i):
return m.f_x1[i] == sum(m.x1[j]*m.D[i,j] for j in range(value(m.n)))
m.f_x1_const = Constraint(m.Ns, rule = f_x1_definition)
But I get the next error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Any help?
The simplest thing is to just use the Python sum() function instead of the Pyomo summation() function:
def f_x1_definition(model,i):
return model.f_x1[i] == sum(model.x1[j]*model.D[i,j] for j in range(value(model.n)))
Also, note that I reversed the order of the Pyomo Var (m.x1) and the matrix (m.D). Based on your other questions (Importing a matrix from Python to Pyomo), I am assuming that the matrix is a NumPy matrix. When multiplying a NumPy value and a Pyomo component (Var or Param), always put the Pyomo object first. This is due to a conflict between the NumPy operator overloading and the Pyomo operator overloading in current versions of Pyomo (up through at least 5.1).
EDIT 1: Note on reversing the order of operands: in your original question, it was not clear that m.D was being defined as a Pyomo Param. There is no concern with the order of Pyomo objects in expressions. The operator overloading problem mentioned above is only when multiplying NumPy objects with Pyomo components. Further, at this time (up through Pyomo 5.1), Pyomo does not support matrix algebra - that is, operations like matrix-matrix or matrix-vector products. Since every expression is a scalar expression, the ordering of the terms in a commutative operation (+, *) does not change the meaning of the expression.
EDIT 2: Your error has nothing to do with the sum/summation you originally posted. The problem is with how you are initializing your Param. At this time (up through Pyomo 5.1), you cannot directly initialize a Param from a numpy.ndarray. You need to first convert the NumPy object into a Python dictionary with something like:
m.D = Param(m.N, m.N, initialize=dict(((i,j),D[i,j]) for i in m.N for j in m.N))

Resources