Fixing the intercept in statsmodels ols - statsmodels

In Python's statsmodels.formula.api, the ols functionality automatically includes and estimates an intercept:
results = sm.ols(formula="s ~ x + y + z", data=somedata).fit()
results.params
(* Intercept 0.632646, x -1.258761, y 0.465076, z 0.497991 *)
Because I'm using it in a linear probability model, is there any way to fix the intercept to 0.5?

You can reproduce this behavior in 2 steps:
Subtract the predefined_intercept from your targets
Fit OLS without intercept: include "-1" in your formula
Minimal example:
from statsmodels.formula.api import ols
import pandas as pd
import numpy as np
n_samples = 100
predefined_intercept = 0.5
somedata = pd.DataFrame(np.random.random((n_samples, 3)), columns = ['x', 'y', 'z'])
somedata['s'] = somedata['x'] - 2 * somedata['y'] + 5 * somedata['z'] - predefined_intercept
results = ols(formula="s ~ x + y + z - 1", data=somedata).fit()
print(results.params)
Output:
x 0.671561
y -2.315076
z 4.759542
See an official example notebook on formulas for detailed explanations and more.

Related

Getting optimal control with economic cost function to converge

I have been using gekko to optimize a bioreactor using example 12 (https://apmonitor.com/wiki/index.php/Main/GekkoPythonOptimization) as a basis.
My model is slightly more complicated with 6 states, 7 states and 2 manipulated variables. When I run it for small values of time (t ~20), the simulation is able to converge (albeit requiring a fine time resolution (dt < 0.1). However, when I try to extend the time (e.g., t = 30), it fails quite consistently with the following error:
EXIT: Converged to a point of local infeasibility. Problem may be infeasible
I have tried the following:
Employing different solvers with m.options.SOLVER = 1,2,3
Increasing m.options.MAX_ITER to 10000
Decreasing m.options.NODES to 1 (a lower order descretization seems to help with convergence)
Supplying a reasonable initial guess to the MVs by specifying a value in the declaration:
D = m.MV(value=0.1,lb=0.0,ub=0.1). From some of the various posts, it seems this should help.
I am not too sure how to go about solving this. For a simplified model (3 states, 5 parameters and 2 MVs), gekko is able to optimize it quite well (though it fails somewhat when I try to go to large t) even though the rate constants of the simplified model are a subset of the full model.
My code is as follows:
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
#Parameters and IC
full_params = [0.027,2.12e-9,7.13e-3,168,168,0.035,1e-3]
full_x0 = [5e6,0.0,0.0,0.0,1.25e5,0.0]
mu,k1,k2,k3,k33,k4, f= full_params
#Initialize model
m = GEKKO()
#Time discretization
n_steps = 201
m.time = np.linspace(0,20,n_steps)
#Define MVs
D = m.MV(value=0.1,lb=0.0,ub=0.1)
D.STATUS = 1
D.DCOST = 0.0
Tin = m.MV(value=1e7,lb=0.0,ub=1e7)
Tin.STATUS = 1
Tin.DCOST = 0.0
#Define States
T = m.Var(value=full_x0[0])
Id = m.Var(value=full_x0[1])
Is = m.Var(value=full_x0[2])
Ic = m.Var(value=full_x0[3])
Vs = m.Var(value=full_x0[4])
Vd = m.Var(value=full_x0[5])
#Define equations
m.Equation(T.dt() == mu*T -k1*(Vs+Vd)*T + D*(Tin-T))
m.Equation(Id.dt() == k1*Vd*T -(k1*Vs -mu)*Id -D*Id)
m.Equation(Is.dt() == k1*Vs*T -(k1*Vd + k2)*Is -D*Is)
m.Equation(Ic.dt() == k1*(Vs*Id + Vd*Is) -k2*Ic -D*Ic)
m.Equation(Vs.dt() == k3*Is - (k1*(T+Id+Is+Ic) + k4 + D)*Vs)
m.Equation(Vd.dt() == k33*Ic + f*k3*Is - (k1*(T+Id+Is+Ic) + k4 + D)*Vd)
#Define objective function
J = m.Var(value=0) # objective (profit)
Jf = m.FV() # final objective
Jf.STATUS = 1
m.Connection(Jf,J,pos2="end")
m.Equation(J.dt() == D*(Vs + Vd))
m.Obj(-Jf)
m.options.IMODE = 6 # optimal control
m.options.NODES = 1 # collocation nodes
m.options.SOLVER = 3
m.options.MAX_ITER = 10000
#Solve
m.solve()
For clarity, the model equations are:
I would be grateful for any assistance e.g., how to implement the scaling of the parameters per https://apmonitor.com/do/index.php/Main/ModelInitialization. Thank you!
Try increasing the value of the final time until the solver can no-longer find a solution such as with tf=28 (successful). A plot of the solution reveals that Tin is adjusted to be zero at about the time where the solution almost fails to converge. I added a couple additional objective forms that didn't help the convergence (see Objective Method #1 and #2). The values of J, Vs, Vd are high but not unmanageable by the solver. One way to think about scaling is by changing units such as changing from kg/day to kg/s as the basis. Gekko automatically scales variables by the initial condition.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
#Parameters and IC
full_params = [0.027,2.12e-9,7.13e-3,168,168,0.035,1e-3]
full_x0 = [5e6,0.0,0.0,0.0,1.25e5,0.0]
mu,k1,k2,k3,k33,k4, f= full_params
#Initialize model
m = GEKKO()
#Time discretization
tf = 28
n_steps = tf*10+1
m.time = np.linspace(0,tf,n_steps)
#Define MVs
D = m.MV(value=0.1,lb=0.0,ub=0.1)
D.STATUS = 1
D.DCOST = 0.0
Tin = m.MV(value=1e7,lb=0,ub=1e7)
Tin.STATUS = 1
Tin.DCOST = 0.0
#Define States
T = m.Var(value=full_x0[0])
Id = m.Var(value=full_x0[1])
Is = m.Var(value=full_x0[2])
Ic = m.Var(value=full_x0[3])
Vs = m.Var(value=full_x0[4])
Vd = m.Var(value=full_x0[5])
#Define equations
m.Equation(T.dt() == mu*T -k1*(Vs+Vd)*T + D*(Tin-T))
m.Equation(Id.dt() == k1*Vd*T -(k1*Vs -mu)*Id -D*Id)
m.Equation(Is.dt() == k1*Vs*T -(k1*Vd + k2)*Is -D*Is)
m.Equation(Ic.dt() == k1*(Vs*Id + Vd*Is) -k2*Ic -D*Ic)
m.Equation(Vs.dt() == k3*Is - (k1*(T+Id+Is+Ic) + k4 + D)*Vs)
m.Equation(Vd.dt() == k33*Ic + f*k3*Is - (k1*(T+Id+Is+Ic) + k4 + D)*Vd)
# Original Objective
if True:
J = m.Var(value=0) # objective (profit)
Jf = m.FV() # final objective
Jf.STATUS = 1
m.Connection(Jf,J,pos2="end")
m.Equation(J.dt() == D*(Vs + Vd))
m.Obj(-Jf)
# Objective Method 1
if False:
p=np.zeros_like(m.time); p[-1]=1
final = m.Param(p)
J = m.Var(value=0) # objective (profit)
m.Equation(J.dt() == D*(Vs + Vd))
m.Maximize(J*final)
# Objective Method 2
if False:
m.Maximize(D*(Vs + Vd))
m.options.IMODE = 6 # optimal control
m.options.NODES = 2 # collocation nodes
m.options.SOLVER = 3
m.options.MAX_ITER = 10000
#Solve
m.solve()
plt.figure(figsize=(10,8))
plt.subplot(3,1,1)
plt.plot(m.time,Tin.value,'r.-',label='Tin')
plt.legend(); plt.grid()
plt.subplot(3,1,2)
plt.semilogy(m.time,T.value,label='T')
plt.semilogy(m.time,Id.value,label='Id')
plt.semilogy(m.time,Is.value,label='Is')
plt.semilogy(m.time,Ic.value,label='Ic')
plt.legend(); plt.grid()
plt.subplot(3,1,3)
plt.semilogy(m.time,Vs.value,label='Vs')
plt.semilogy(m.time,Vd.value,label='Vd')
plt.semilogy(m.time,J.value,label='Objective')
plt.legend(); plt.grid()
plt.show()
Is there any type of constraint in the problem that would favor a decrease at the end? This may be the cause of the infeasibility at tf=30. Another way to get a feasible solution is to solve with m.options.TIME_STEP=20 and resolve the problem with the initial conditions from the prior solution equal to the value at time step 20.
#Solve
m.solve()
m.options.TIME_SHIFT=20
m.solve()
This way, the solution steps forward in time to optimize in parts. This strategy was used to optimize a High Altitude Long Endurance (HALE) UAV and is called Receding Horizon Control.
Martin, R.A., Gates, N., Ning, A., Hedengren, J.D., Dynamic
Optimization of High-Altitude Solar Aircraft Trajectories Under
Station-Keeping Constraints, Journal of Guidance, Control, and
Dynamics, 2018, doi: 10.2514/1.G003737.

3D gaussian generator/transform

How do I generate 3 Gaussian variables? I know that the Box-Muller algorithm can be used to convert two (U1,U2) uniform variables into two (X,Y) Gaussian variables but how do i generate the 3rd one? (Z).
A simple way:
It is unlikely in this sort of game that you will need 3 Gaussian variates just once.
You need some store variable that can contains either a triplet of Gaussian variates or nothing (Null, Nothing, Empty, whatever that is in your programming language, you didn't tell us which one).
Initially, the store contains nothing (empty).
When asked for a triplet:
if the store contains a triplet, just return that triplet.
And mark the store as empty.
if the store is empty, run Box-Muller 3 times.
That gives you 2 triplets.
Put the second triplet in the store.
Return the first triplet.
An alternative way for the mathematically inclined programmer:
If one just tries to adapt Box-Muller to 3 dimensions, the sole tricky part is to get the norm of the random 3D vector. The rest is about the 2 spherical angles θ (theta) and φ (phi), which is easy stuff.
It turns out that in 3 dimensions, that norm involves the inverse of the incomplete gamma function.
And if you have Python and Numpy/Scipy, this is function scipy.special.gammaincinv.
We can thus write this code:
import math
import numpy.random as rd
import scipy.special as sp
# convert 3 uniform [0,1) variates into 3 unit Gaussian variates:
def boxMuller3d(u3):
u0,u1,u2 = u3 # 3 uniform random numbers in [0,1)
gamma = u0
norm2 = 2.0 * sp.gammaincinv(1.5, gamma) # "regularized" versions
norm = math.sqrt(norm2)
zr = (2.0 * u1) - 1.0 # sin(theta)
hr = math.sqrt(1.0 - zr*zr) # cos(theta)
phi = 2.0 * math.pi * u2
xr = hr * math.cos(phi)
yr = hr * math.sin(phi)
g3 = list(map(lambda c: c*norm, [xr, yr, zr]))
return g3
# generate 3 uniform variates and convert them into 3 unit Gaussian variates:
def gauss3(rng):
u3 = rng.uniform(0.0, 1.0, 3)
g3 = boxMuller3d(u3)
return g3
To (partly) check correctness, we can have this small main program, which displays the statistical moments of order 1 to 4 of the resulting random serie:
randomSeed = 42
rng = rd.default_rng(randomSeed)
count = 3000000 # (X,Y,Z) triplet count
variates = []
for i in range(count):
g3 = gauss3(rng)
variates += g3
ln = len(variates)
print("length=%d\n" % ln)
# Checking statistical moments of order 1 to 4:
m1 = sum(variates) / ln
m2 = sum( map(lambda x: x*x, variates) ) / ln
m3 = sum( map(lambda x: x**3, variates) ) / ln
m4 = sum( map(lambda x: x**4, variates) ) / ln
print("m1=%g m2=%g m3=%g m4=%g\n" % (m1,m2,m3,m4))
Test program output:
length=9000000
m1=-0.000455911 m2=1.00025 m3=-0.000563454 m4=3.00184
We thus can see that these moments are reasonably close to their mathematically expected values, respectively 0,1,0,3.

Generate matrix in Keras based on row and column

How I could create a layer in Keras that ouputs matrix given dimensions (e.g. m, n) with cells having a value based on the row and column?
Here is the forumula:
A[i, 2j] = i / (10**(2*j))
A[i, 2j+1] = i / (10**(2*j))
I tried to look on the lamba function but it seems Keras passed only the cell value and not the indices! Any other options (not a loop)
You could do the following:
from keras.layers import Input
import keras.backend as K
import numpy as np
def CustomConstantInput(m, n):
x = np.arange(m)
y = 10 ** (2 * (np.arange(n) // 2))
matrix = x[:, None] / y[None, :]
print(matrix)
fixed_input = K.constant(matrix)
return Input(tensor=fixed_input)
t = CustomConstantInput(3, 4)

Finding center point given distance matrix

I have a matrix (really a loaded image) in which every element is a L2 distance from some unknown center point.
Here is a trivial example
A = [1.4142 1.0000 1.4142 2.2361]
[1.0000 0.0000 1.0000 2.0000]
[1.4142 1.0000 1.4142 2.2361]
In this case, the center is obviously at coordinate (1,1) (index A[1,1] in a 0-indexed matrix or 2D array).
However, in the case where my centers are not constrained to be integer indices, it's no longer as obvious. For example, given this matrix B, where is my center coordinate?
B = [3.0292 1.9612 2.8932 5.8252]
[1.2292 0.1612 1.0932 4.0252]
[1.4292 0.3612 1.2932 4.2252]
How would you find that the answer in this case is at row 1.034 and column 1.4?
I am aware of the trilateration solution (having provided MATLAB code to visualize that in 3D previously), but is there a more efficient way (e.g. one without a matrix inversion)?
This question is sort of language agnostic, as I am looking more for algorithmic help. If you could stick to MATLAB, Python, or C++ though in a solution, that would be great ;-).
While having no experience with similar tasks, i read some stuff and also tried something.
When unfamiliar with this topic it's hard to grasp it seems and all those resources i found are a bit chaotic.
Still unclear in regards to theory for me:
is the problem as stated above a convex-optimization problem (local-minimum = global-minimum; would mean access to powerful solvers!)
there are much more resources about more generic problems (Sensor Network
Localization), which are non-convex and where extremely complex methods have been developed
is your trilateration-approach able to exploit > 3 points (trilateration vs. multilateration; at least this code does not seem like it can which means: bad performance with noise!)
Here some example code with two approaches:
A: Convex-optimization: SOCP-Relaxation
Follows SECOND-ORDER CONE PROGRAMMING RELAXATION OF SENSOR NETWORK LOCALIZATION
Not impressive performance, but should be powerful as approximation for big-data
Guaranteed global-optimum for this relaxation!
Implemented with cvxpy
B: Nonlinear-programming optimization
Implemented using scipy.optimize
Pretty much perfect in my synthetic experiments; even good results in noisy case; despite the fact we are using numerical-differentiation (automatic-diff hard to use here)
Some additional remark:
Your example B surely has some (pretty bad) noise or some other problem in my opinion, as my approaches are completely off; while especially approach B shines for my synthetic-data (at least that's my impression)
Code:
import numpy as np
import cvxpy as cvx
from scipy.spatial.distance import cdist
from scipy.optimize import minimize
np.random.seed(1)
""" Create noise-free (not anymore!) fake-problem """
real_x = np.random.random(size=2) * 3
M, N = 5, 10
NOISE_DISTS = 0.1
pos = np.array([(i,j) for i in range(M) for j in range(N)]) # ugly -> tile/repeat/stack
real_x_stacked = np.vstack([real_x for i in range(pos.shape[0])])
Y = cdist(pos, real_x[np.newaxis])
Y += np.random.normal(size=Y.shape)*NOISE_DISTS # Let's add some noise!
print('-----')
print('PROBLEM')
print('-------')
print('real x: ', real_x)
print('dist mat: ', np.round(Y,3).T)
""" Helper """
def cost(x, Y, pos):
res = np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
return np.linalg.norm(res, 2)
print('cost with real_x (check vs. noisy): ', cost(real_x, Y, pos))
""" SOLVER SOCP """
def solve_socp_relax(pos, Y):
x = cvx.Variable(2)
y = cvx.Variable(pos.shape[0])
fake_stack = [x for i in range(pos.shape[0])] # hacky
objective = cvx.sum_entries(cvx.norm(y - Y))
x_stacked = cvx.reshape(cvx.vstack(*fake_stack), pos.shape[0], 2) # hacky
constraints = [cvx.norm(pos - x_stacked, 2, axis=1) <= y]
problem = cvx.Problem(cvx.Minimize(objective), constraints)
problem.solve(solver=cvx.ECOS, verbose=False)
return x.value.T
""" SOLVER NLP """
def solve_nlp(pos, Y):
sol = minimize(cost, np.zeros(pos.shape[1]), args=(Y, pos), method='BFGS')
# print(sol)
return sol.x
""" TEST """
print('-----')
print('SOLVE')
print('-----')
socp_relax_sol = solve_socp_relax(pos, Y)
print('SOCP RELAX SOL: ', socp_relax_sol)
nlp_sol = solve_nlp(pos, Y)
print('NLP SOL: ', nlp_sol)
Output:
-----
PROBLEM
-------
real x: [ 1.25106601 2.16097348]
dist mat: [[ 2.444 1.599 1.348 1.276 2.399 3.026 4.07 4.973 6.118 6.746
2.143 1.149 0.412 0.766 1.839 2.762 3.851 4.904 5.734 6.958
2.377 1.432 0.856 1.056 1.973 2.843 3.885 4.95 5.818 6.84
2.711 2.015 1.689 1.939 2.426 3.358 4.385 5.22 6.076 6.97
3.422 3.153 2.759 2.81 3.326 4.162 4.734 5.627 6.484 7.336]]
cost with real_x (check vs. noisy): 0.665125233772
-----
SOLVE
-----
SOCP RELAX SOL: [[ 1.95749275 2.00607253]]
NLP SOL: [ 1.23560791 2.16756168]
Edit: Further speedup can be achieved (especially in large-scale) in using nonlinear-least-squares instead of the more general NLP-approach! My results are still the same (as expected if the problem would be convex). Timings between NLP/NLS can look like 9 vs. 0.5 seconds!
This is my recommended method!
def solve_nls(pos, Y):
def res(x, Y, pos):
return np.linalg.norm(pos - x, ord=2, axis=1) - Y.ravel()
sol = least_squares(res, np.zeros(pos.shape[1]), args=(Y, pos), method='lm')
# print(sol)
return sol.x
Especially the second-approach (NLP) will also run for much bigger instances (cvxpy's overhead hurts; that's not a downside of the SOCP-solver which should scale much much better!).
Here some output for M, N = 500, 1000 with some more noise:
-----
PROBLEM
-------
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 24.706 23.573 23.693 ..., 1090.29 1091.216
1090.817]]
cost with real_x (check vs. noisy): 353.354267797
-----
SOLVE
-----
NLP SOL: [ 12.51082419 21.60911561]
used: 5.9552763315495625 # SECONDS
So in my experiments it works, but i won't give any global-convergence guarantees or reconstruction-guarantees (still missing some theory).
At first i though about using the global optimum of the relaxed-SOCP-problem as initial-point in the NLP-solver, but i did not find any example where this is needed!
Some just-for-fun visuals using:
M, N = 20, 30
NOISE_DISTS = 0.2
...
import matplotlib.pyplot as plt
plt.imshow(Y.reshape(M, N), cmap='viridis', interpolation='none')
plt.colorbar()
plt.scatter(nlp_sol[1], nlp_sol[0], color='red', s=20)
plt.xlim((0, N))
plt.ylim((0, M))
plt.show()
And some super noisy case (nice performance!):
M, N = 50, 100
NOISE_DISTS = 5
-----
PROBLEM
-------
real x: [ 12.51066014 21.6097348 ]
dist mat: [[ 22.329 18.745 27.588 ..., 94.967 80.034 91.206]]
cost with real_x (check vs. noisy): 354.527196716
-----
SOLVE
-----
NLP SOL: [ 12.44158986 21.50164637]
used: 0.01050068340320306
If I understand correctly, you have a matrix A, where A[i,j] holds the distance from (i,j) to some unknown point (y,x). You could find (y,x) like this:
Square each element of A, to make a matrix B say.
We then want to find (y,x) so
(y-i)*(y-i) + (x-j)*(x-j) = B[i,j]
Subtracting each equation from the 0,0 equation and rearranging:
2*i*y + 2*j*x = B[0,0] + i*i + j*j - B[i,j]
This can be solved by linear least squares. Note that since there are 2 unknowns, the matix inversion (better, factorisation) involved will be on a 2x2 matrix and so not time consuming. You could indeed, given just the dimensions of A, work out the required matrix and its inverse analytically.

Why is fitting a polynomial faster than changing polynomial basis?

Given some data points on the interval [–1, 1] and the best-fit Chebyshev polynomial to those points, I want to convert the Chebyshev polynomial to a Legendre polynomial.
There are 2 ways to do it, as shown in the code below. The direct way is to call convert(kind = Legendre) on the Chebyshev polynomial, which took 19.591 seconds. The alternative is to call Legendre.fit on the data points, which took only 3.356 seconds.
import numpy as np
from numpy.polynomial import Chebyshev, Legendre
x = np.linspace(-1, 1, 1000)
y = 1.0 / (1 + x ** 2) + 1e-3 * np.random.random(1000)
T = Chebyshev.fit(x, y, 99)
from timeit import timeit
timeit("T.convert(kind = Legendre)", setup = "from __main__ import x, y, T, Legendre",
number = 200)
timeit("Legendre.fit(x, y, 99)", setup = "from __main__ import x, y, Legendre",
number = 200)
Question: Why is Legendre.fit much faster than convert(kind = Legendre)? Am I doing it wrongly?

Resources