Optimization integer programming with covariance matrix - gekko

I am trying to do a optimization problem which requires the calculation of a new covariance matrix affected by the variable within the implementation.
I am able to do so with scipy optimization Minimize using numpy.cov within my objective function. However, as I need to have integer constraints, I am not able to think of a solution which tackles my issue with cvxpy, gekko since most of the optimization problem online have a fixed covariance matrix.
Below is my code for scipy:
room_revpar = np.array(df.iloc[:,1:10])
nla = np.array([753.2,1077.6, 1278.6,1463.9,1657.0,1990.6,2404.9,2754.6,3464.72])
min_nla = 270517.16
max_nla = 271270.359995
def objective(x, room_revpar,nla,sign = -1.0):
room_revenue = room_revpar * x
avg_revenue = np.mean(room_revenue, axis = 0)
total_revenue = sum(avg_revenue)
cov_matrix = np.cov(room_revenue.T)
total_nla = np.matmul(x.T, nla)
weights = x * nla / total_nla
portfolio_sd = np.sqrt(np.matmul(np.matmul(weights.T, cov_matrix), weights))
adj_risk = total_revenue / portfolio_sd
return sign * adj_risk
def constraint1(x, nla, min_nla):
total_nla = np.matmul(x.T, nla)
return total_nla - min_nla
def constraint2(x, nla, max_nla):
total_nla = np.matmul(x.T, nla)
return max_nla - total_nla
con1 = {'type': 'ineq', 'fun': constraint1, 'args': (nla, min_nla)}
con2 = {'type': 'ineq', 'fun': constraint2, 'args': (nla, max_nla)}
from scipy.optimize import minimize
x = np.ones(9)
sol = minimize(objective,x0 = x, args = (room_revpar, nla), constraints = (con1,con2), options = {'maxiter': 100000})
Would appreciate if anybody has a solution! Thank you.

The covariance of xi and yi is calculated explicitly with np.cov().
import numpy as np
xi = [2.1,2.5,3.6,4.0]
yi = [8,10,12,14]
print(np.cov(xi,yi))
The function np.cov(xi,yi) returns a 2x2 symmetric matrix
[[cov[xi,xi],cov[xi,yi]],
[cov[xi,yi],cov[yi,yi]]]
[[0.80333333 2.26666667]
[2.26666667 6.66666667]]
Gekko needs a symbolic form of the covariance formula for the gradient-based optimizer. Below is a function cov() that creates the symbolic covariance calculation with Gekko variables.
import numpy as np
from gekko import GEKKO
def cov(m,x,y,ddof=1):
''' Calculate the covariance matrix of x, y
Inputs:
m: Gekko model
x: x vector of equal length to y
y: y vector of equal length to x
[ddof=1]: delta degrees of freedom
Returns:
c: covariance as a Gekko variable
'''
nx = len(x); ny = len(y) # length of x, y
if nx!=ny:
print('Error: mismatch of x and y')
xm = m.sum(x)/nx # mean of x
ym = m.sum(y)/ny # mean of y
c = m.Var() # covariance
m.Equation(c==(m.sum([(x[i]-xm)*(y[i]-ym) \
for i in range(nx)]))/(nx-ddof))
return c
m = GEKKO()
n = 4
x = m.Array(m.Param,n)
y = m.Array(m.Param,n)
xi = [2.1,2.5,3.6,4.0]
yi = [8,10,12,14]
for i in range(n):
x[i].value = xi[i]
y[i].value = yi[i]
c0 = cov(m,x,y,ddof=0)
c1 = cov(m,x,y)
m.solve(disp=False)
print('Covariance (Numpy) population cov: ', np.cov(xi,yi,ddof=0)[0,1])
print('Covariance (Numpy) sample cov: ', np.cov(xi,yi)[0,1])
print('Covariance (Gekko) population cov: ', c0.value[0])
print('Covariance (Gekko) sample cov: ', c1.value[0])
Gekko and Numpy produce the same results for the fixed xi and yi values:
Covariance (Numpy) population cov: 1.7
Covariance (Numpy) sample cov: 2.2666666666666666
Covariance (Gekko) population cov: 1.7
Covariance (Gekko) sample cov: 2.2666666667
Now that the cov() function is verified, you can switch x and y to be calculated integer values such as:
x = m.Array(m.Var,n,lb=0,ub=10,integer=True)
y = m.Array(m.Var,n,lb=0,ub=5,integer=True)
To obtain an integer solution, switch to m.options.SOLVER=1 (APOPT) solver before the m.solve() command.

Related

Calculating kernel-density estimate in parallel

I want to perform kernel-density estimate on a grid in parallel. There is some scaling involved and I can't figure out how to obtain the same results when performing the calculation on different threads.
To illustre this, I'm using an simple example from the scipy documentation.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
def measure(n):
"Measurement model, return two coupled measurements."
m1 = np.random.normal(size=n)
m2 = np.random.normal(scale=0.5, size=n)
return m1+m2, m1-m2
m1, m2 = measure(2000)
xmin = m1.min()
xmax = m1.max()
ymin = m2.min()
ymax = m2.max()
kde_bandwith = 0.5
X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values, bw_method=kde_bandwith)
Z = np.reshape(kernel.evaluate(positions).T, X.shape)
I'm performing the last two lines on n_split subsets of the data (in practice this would be performed on different thread but here I kept a simple loop for simplicity):
## split
n_split = 10
values_split = np.array_split(values, n_split, axis=1)
results = []
for v in values_split:
kernel_n = stats.gaussian_kde(v, bw_method=kde_bandwith)
Z_n = np.reshape(kernel_n.evaluate(positions).T, X.shape)
results.append(Z_n)
Z2 = np.sum(results, axis=0)
here I'm simply recombining the results from each subset of data by summing the Z_n of each subset together into Z2.
Finally, a simple code to compare the results: left is the original version, middle is the result obtain from combining the subsets, and right the difference between the two.
fig = plt.figure(figsize=(9,2), dpi=200)
ax1 = fig.add_subplot(1,3,1)
ax2 = fig.add_subplot(1,3,2)
ax3 = fig.add_subplot(1,3,3)
c1 = ax1.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r,
extent=[xmin, xmax, ymin, ymax])
#ax1.plot(m1, m2, 'k.', markersize=2)
ax1.set_xlim([xmin, xmax])
ax1.set_ylim([ymin, ymax])
plt.colorbar(c1, shrink=0.6)
c2 = ax2.imshow(np.rot90(Z2), cmap=plt.cm.gist_earth_r,
extent=[xmin, xmax, ymin, ymax])
#ax2.plot(m1, m2, 'k.', markersize=2)
ax2.set_xlim([xmin, xmax])
ax2.set_ylim([ymin, ymax])
plt.colorbar(c2, shrink=0.6)
c3 = ax3.imshow(np.rot90(Z2-Z), cmap=plt.cm.gist_earth_r,
extent=[xmin, xmax, ymin, ymax])
ax3.plot(m1, m2, 'k.', markersize=0.1)
ax3.set_xlim([xmin, xmax])
ax3.set_ylim([ymin, ymax])
plt.colorbar(c3, shrink=0.6)
plt.show()
Notes: It looks like the magnitude of the KDE is n_split larger, but by simply normalizing with 1/n_split or even len(values_split)/len(values) (which in this case is equal), the results are not exactly the same, shown on the next figure as reference.

Bayesian calibration for ode system

I tried to use the 'pymc3' package in Python to calibrate a first-order ODE system in a Bayesian way.
I started with a toy ODE system first. It is dy1/dt = y2; dy2/dt = -b* y2 - c*sin(y1). b and c are the parameters I want to calibrate.
Firstly, I generated some outputs for y1 and y2 from t[0,10] by setting parameters b = 0.25 and c = 0.5 with normally distributed noise~N(0,0.7^2). Then, I calibrated the ODE system by setting prior distributions for b~N(0,1) and c~N(0,9) and sigma~HalfNormal
But it gave the errors: (1) TypeError: float() argument must be a string or a number, not 'TensorVariable'(2)ValueError: setting an array element with a sequence.
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from scipy.integrate import ode
import pymc3 as pm
def pend(y, t, b, c):
theta, omega = y
dydt = [omega, -b*omega - c*np.sin(theta)]
return dydt
true_b = 0.25
true_c = 5.0
y0 = [np.pi - 0.1, 0.0]
t = np.linspace(0, 10, 101)
sol = odeint(pend, y0, t, args=(true_b, true_c))
true_sigma = 0.7
noise = np.random.randn(101,2)*true_sigma
Y_obs = sol+noise
pend_model = pm.Model()
with pend_model:
# Priors for unknown model parameters
b = pm.Normal('b', mu=0, sd=1)
c = pm.Normal('c', mu=7, sd=3)
sigma = pm.HalfNormal('sigma', sd=1)
# Expected value of outcome
mu = odeint(pend, y0, t, args=(b, c))
# Likelihood (sampling distribution) of observations
Y = pm.Normal('Y_obs', mu=mu, sd=sigma, observed=Y_obs)
trace = pm.sample(draws=5000, tune=500, chains=1)

Simulation of Lorenz System crashes

I only just started exploring Gekko and tried simulating the Lorenz ODE system.
Unfortunately, I get an error ("no solution found") for a simple std case the runs fine using scipy.
The problem solves fine if I only integrate up to say time=0.5 instead of 1.0
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.arange(0.0, 1.0, 0.01)
sigma = 10.; rho = 28.0; beta = 8./3.
x = m.Var(value=10); y = m.Var(value=10); z = m.Var(value=10)
t = m.Param(value=m.time)
m.Equation(x.dt()== sigma*(y - x))
m.Equation(y.dt()== x*(rho -z) - y)
m.Equation(z.dt()== x*y - beta*z)
m.options.IMODE = 4
m.options.nodes = 4
m.solve(disp=False)
plt.plot(x.value, y.value)
plt.show()
Sequential simulation m.options.IMODE=7 solves successfully. The simultaneous simulation is a larger problem (1782 Variables/Equations vs. 18 Variables/Equations). It also solves successfully if you reduce m.options.NODES=3 (1188 Variables) or increase the maximum iterations m.options.MAX_ITER=300. It failed previously because it needed 267 iterations to find the solution and the maximum iteration limit for IPOPT is 200 by default.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.arange(0.0, 1.0, 0.01)
sigma = 10.; rho = 28.0; beta = 8./3.
x = m.Var(value=10); y = m.Var(value=10); z = m.Var(value=10)
t = m.Param(value=m.time)
m.Equation(x.dt()== sigma*(y - x))
m.Equation(y.dt()== x*(rho -z) - y)
m.Equation(z.dt()== x*y - beta*z)
m.options.IMODE = 7
m.options.nodes = 4
m.solve(disp=True)
plt.plot(x.value, y.value)
plt.show()

KNN Classifier ValueError: Unknown label type: 'continuous'

We are going to introduce an extra 20-dimensional predictor 𝑧 , which does NOT actually play a role in generating 𝑦 . Yet, when in estimation, we do not know the fact and will use both 𝑥 and 𝑧 as predictors in the KNN algorithm.
We need to generate 𝑧 , the 20-dimensional predictors, of the same sizes. Each 𝑧 is a 20-dimensional multivariate normal random variable, with mean being (0,0,…,0) and identity covariance matrix (so that the 20 elements are independent standard normal random variables). The resulted 𝑧 is a 100*20 matrix, with each row being a data point with 20 dimensionsFor a fixed 𝑘=15 , fit a KNN model to predict 𝑦 with (𝑥,𝑧) , and measure the training and test MSE. (1 mark)
What's wrong in the code below?
#training data
x = np.arange(0 , 5 , 0.05)
f_x = beta0 + beta1 * x + beta2 * x**2 + beta3 * x**3
epsilon = np.random.normal(loc=0, scale=sigma, size=100)
y = f_x + epsilon
## test data
x_test = np.arange(0 , 6, 0.1)
f_x_test = beta0 + beta1 * x_test + beta2 * x_test**2 + beta3 * x_test**3
epsilon_test = np.random.normal(loc=0, scale=sigma, size=len(x_test))
y_test = f_x_test + epsilon_test
z = np.random.multivariate_normal(size = 100, mean=[0]*20, cov=np.identity(20))
z_test = np.random.multivariate_normal(size = 60, mean=[0]*20, cov=np.identity(20))
train_x = np.concatenate((np.expand_dims(x, axis = 1),z),axis = 1)
test_x = np.concatenate((np.expand_dims(x_test, axis = 1),z_test),axis = 1)
from sklearn.neighbors import KNeighborsClassifier
from sklearn import preprocessing
knn = KNeighborsClassifier(n_neighbors = 15)
from sklearn.metrics import mean_squared_error
knn.fit(train_x,y)
y_pred_train = knn.predict(train_x)
y_pred_test = knn.predict(test_x)
mse_train = mean_squared_error(y,y_pred_train)
mse_test = mean_squared_error(y_test,y_pred_test)
instead of KNeighborsClassifier, try KNeighborsRegressor
knn_reg_model = KNeighborsRegressor(n_neighbors=k,algorithm='auto').fit(train_x,y.reshape(-1,1))

Matplotlib Countour not Connected

As a Python novice and trying to visualize the curve X2*Y + X*Y2 - X4 - Y4 = 0 with Matplotlib:
from matplotlib.pyplot import *
from sympy import *
from numpy import *
delta = 0.025
p = arange(-0.5, 1.5, delta)
q = arange(-0.5, 1.5, delta)
X, Y = meshgrid(p, q)
Z = X**2*Y + X*Y**2 - X**4 - Y**4
fig, ax = subplots()
CS = ax.contour(X, Y, Z, [0], colors ='k')
ax.set_title('x**2*y + x*y**2 - x**4 - y**4')
show()
the result is that the plot is not connected, whereas mathematically, it should be so. How can the level set be connected?
It's a year later, but for future reference: You just have to choose a smaller stepsize delta. With your delta = 0.025 your get the disconnected picture:
With delta = 0.001 you get a more accurate connected picture:

Resources