I keep getting Nan values for my beta estimated by linear regression for financial data - windows

I'm supposed to ask this question:
calculate the exposure of each stock’s excess return (Y) to CATFIN factor (X) using 60-month rolling window estimation period, i.e., the first estimation window should be 1/1973-12/1977, the second 2/1973- 1/1978, and so on, and the last estimation window should be 1/2017-12/2021.
I tried this regression to estimate the exposure which is the beta but i get only nan values for my beta
`import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.regression.rolling import RollingOLS
# Set permno and time as the index
data_test.set_index(['PERMNO', 'date'], inplace=True)
# Define the rolling window size (60 months in this case)
window = 60
# Define the X and Y variables for the regression
X = data_test['CATFIN']
Y = data_test['excess_return']
# Define a function to run the rolling regression
def rolling_beta(data_test):
if len(data_test) >= window:
model = RollingOLS(endog=data_test['excess_return'], exog=data_test['CATFIN'], window=window)
rolling_beta = model.fit().params
if not np.isnan(rolling_beta).any():
return rolling_beta[-1]
else:
return np.nan
else:
return np.nan
# Apply the rolling_beta function to each PERMNO observation
rolling_betas = data_test.groupby('PERMNO').apply(rolling_beta)
# Convert the rolling betas series to a dataframe
rolling_betas_df = pd.DataFrame(rolling_betas, columns=['beta'])
# Reset the index of the rolling betas dataframe
rolling_betas_df = rolling_betas_df.reset_index()
# Print the rolling betas for each PERMNO
print(rolling_betas_df)
`

Related

Optimal parameters not found: Number of calls to function has reached maxfev = 100

I'm new to python, I try to give some adjustment to the data, but when I get the graph, only the original data appears and with the message "Optimal parameters not found: Number of calls to function has reached maxfev = 1000." Could you help me find my mistake?
%matplotlib inline
import matplotlib.pylab as m
from scipy.optimize import curve_fit
import numpy as num
import scipy.optimize as optimize
xData=num.array([0,0,100,200,250,300,400], dtype="float")
yData=num.array([0,0,0,0,75,100,100], dtype="float")
m.plot(xData, yData, 'ro', label='Datos originales')
def fun(x, a, b):
return a + b * num.log(x)
popt,pcov=optimize.curve_fit(fun, xData, yData,p0=[1,1], maxfev=1000)
print=popt
x=num.linspace(1,400,7)
m.plot(x,fun(x, *popt), label='Función ajustada')
m.xlabel('concentración')
m.ylabel('% mortalidad')
m.legend()
m.grid()
The model in your code is "a + b * num.log(x)". Because your data contains an x value of 0.0, the evaluation of log(0.0) gives errors and will not allow the fitting software to function. Sometimes these x values of 0.0 can be replaced with very small numbers, as log(small number) will not fail - but in this case the equation and data do not appear to match and so using that technique alone would not be sufficient here.
My thought is that a different equation would be a better model for this data. I performed an equation search using your data, and found that several different sigmoidal type equations gave suspiciously good fits to this data set - which is not surprising because of the small number of data points.
The sigmoidal equations I tried were all extremely sensitive to the initial parameter estimates. Here is a graphical Python fitter using scipy's Differential Evolution genetic algorithm module to determine the initial parameter estimates for curve_fit's non-linear solver. That scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, requiring bounds within which to search. Here those bounds are taken from the data maximum and minimun values.
I personally would not use this fit precisely because the small number of data points is giving such suspiciously good fits, and strongly recommend taking additional data points if at all possible. I could however not find any equations with less than three parameters that would fit the data.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData=numpy.array([0,0,100,200,250,300,400], dtype="float")
yData=numpy.array([0,0,0,0,75,100,100], dtype="float")
def func(x, a, b, c): # Sigmoid B equation from zunzun.com
return a / (1.0 + numpy.exp(-1.0 * (x - b) / c))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
parameterBounds = []
parameterBounds.append([minX, maxX]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
parameterBounds.append([0.0, 2.0]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData), 100)
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

How to use a vector set point with mpc inorder to give program information on how the future set point will change

I am using an MPC to run a heater system. Currently I have it take an individual value from my set point array at a given point in time to adjust the process to reach. I would like to be able to give it the current desired value and a couple of points in the future for the set point so that it can better adjust as the set point changes. How can I give gekko a vector in order to have it better adjust to future set points?
this is the part of my code that currently updates my setpoint values.
T1[i] = a.T1
T2[i] = a.T2
TC1.MEAS = T1[i]
TC2.MEAS = T2[i]
DT = .1
TC1.SPHI = sp1[i] + DT #sp1 and sp2 are set point arrays for the two heaters
TC1.SPLO = sp1[i] - DT
TC2.SPHI = sp2[i] + DT
TC2.SPLO = sp2[i] - DT
m.solve(disp=False)
The gekko CV object only uses a scalar value of an array for SP, SPHI, and SPLO so some modification is needed to have the optimizer consider future setpoint changes. A simple MPC application shows how setpoints are used in Gekko.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.linspace(0,20,41)
p = m.MV(value=0, lb=0, ub=100) # Declare MV
p.STATUS = 1 # allow optimizer to change
p.DCOST = 0.1 # smooth MV response
p.DMAX = 10.0 # max move each cycle
v = m.CV(value=0) # Declare CV
v.STATUS = 1 # add CV to the objective
m.options.CV_TYPE = 2 # squared error
v.SP = 40 # set point
v.TR_INIT = 1 # set point trajectory
v.TAU = 5 # time constant of trajectory
m.Equation(10*v.dt() == -v + 2*p)
m.options.IMODE = 6 # control
m.solve(disp=False)
# get additional solution information
import json
with open(m.path+'//results.json') as f:
results = json.load(f)
plt.figure()
plt.subplot(2,1,1)
plt.plot(m.time,p.value,'b-',label='MV Optimized')
plt.legend()
plt.ylabel('Input')
plt.subplot(2,1,2)
plt.plot(m.time,results['v1.tr'],'k-',label='Reference Trajectory')
plt.plot(m.time,v.value,'r--',label='CV Response')
plt.ylabel('Output')
plt.xlabel('Time')
plt.legend(loc='best')
plt.show()
There is one setpoint value that is a target of 40 and a reference trajectory is defined with a time constant of 5. The MPC is not able to follow the reference trajectory exactly because of a rate of change constraint with DMAX=10. There are two options if you'd like the optimizer to know about future setpoint changes.
Option 1: Don't use CV, Use Feedforward Parameter
If you don't need the reference trajectory and are okay with a squared error objective then then easiest method to project future setpoint changes is to define your own MPC objective with a feedforward parameter vector of setpoint values. The example problem shows that the optimizer is anticipating the setpoint change and proactively moves before the next setpoint change to minimize the overall sum of squared error. This may be desirable in many circumstances but may be undesirable in manufacturing where there are product grade changes and the end of the production campaign should be in-spec before producing transition material.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.linspace(0,20,41)
p = m.MV(value=0, lb=0, ub=100) # Declare MV
p.STATUS = 1 # allow optimizer to change
p.DCOST = 0.1 # smooth MV response
p.DMAX = 10.0 # max move constraint
v = m.Var(value=0)
sp = np.ones(41)*40
sp[20:] = 60
s = m.Param(value=sp)
m.Obj((s-v)**2)
m.Equation(10*v.dt() == -v + 2*p)
m.options.IMODE = 6 # control
m.solve(disp=False)
plt.figure()
plt.subplot(2,1,1)
plt.plot(m.time,p.value,'b-',label='MV Optimized')
plt.legend()
plt.ylabel('Input')
plt.subplot(2,1,2)
plt.plot(m.time,sp,'k-',label='Setpoint')
plt.plot(m.time,v.value,'r--',label='CV Response')
plt.ylabel('Output')
plt.xlabel('Time')
plt.legend(loc='best')
plt.show()
Option 2: Error as CV
If it is desirable to use the reference trajectory and Gekko built-in CV options, then an option is to define a new error variable e and control that instead. The error variable always has a setpoint of zero and the feedforward setpoint is implemented as a feedforward parameter.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.linspace(0,20,41)
p = m.MV(value=0, lb=0, ub=100) # Declare MV
p.STATUS = 1 # allow optimizer to change
p.DCOST = 0.1 # smooth MV response
p.DMAX = 10.0 # max move constraint
v = m.Var(value=0)
sp = np.ones(41)*40
sp[20:] = 60
s = m.Param(value=sp)
e = m.CV(value=0) # Declare CV
e.STATUS = 1 # add CV to the objective
m.options.CV_TYPE = 2 # squared error
e.SP = 0 # set point
e.TR_INIT = 1 # error trajectory
e.TAU = 5 # time constant of trajectory
m.Equation(e==s-v)
m.Equation(10*v.dt() == -v + 2*p)
m.options.IMODE = 6 # control
m.solve(disp=False)
plt.figure()
plt.subplot(2,1,1)
plt.plot(m.time,p.value,'b-',label='MV Optimized')
plt.legend()
plt.ylabel('Input')
plt.subplot(2,1,2)
plt.plot(m.time,sp,'k-',label='Setpoint')
plt.plot(m.time,v.value,'r--',label='CV Response')
plt.ylabel('Output')
plt.xlabel('Time')
plt.legend(loc='best')
plt.show()
For every time step, Gekko automatically generates the setpoint in the form of the array for the future control horizon. And, the arrays are usually filled with the single value that you assigned. However, you can give the setpoint as an array as shown below.
sp1 = np.array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8]])
Then, you can assign the every row of your matrix for each time step just as you did in your question.
DT = .1
TC1.SPHI = sp1[i] + DT
Note:
You need to have the same length of setpoint array which means the size of the column in your setpoint matrix with the control horizon (e.g. 'm.time').
You might want to set the setpoint trajectory option '0' if you don't want to filter out your setpoint sequence in your array. (TR_INIT = 0)

Is it possible to get the data of biased and unbiased predicted controlled variables when using APMonitor for Model Predictive Control in Python?

I am trying to build a python code for Model Predictive Control using APMonitor. However, I don't want to get the results on an third party online server. Hence, I want to collect the data of the predicted biased and unbiased and plot them on Python myself.
Try this in Python Gekko:
# get additional solution information
import json
with open(m.path+'//results.json') as f:
results = json.load(f)
You can get the unbiased model result by getting the dictionary value of your variable v with v.name. You can get the biased model prediction with v.name+'.bcv'. Here is an example that also shows how to get the raw trajectory information.
This gives you access to the raw data. An example shows how to plot from the JSON data.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.linspace(0,20,41)
# Parameters
mass = 500
b = m.Param(value=50)
K = m.Param(value=0.8)
# Manipulated variable
p = m.MV(value=0, lb=0, ub=100)
p.STATUS = 1 # allow optimizer to change
p.DCOST = 0.1 # smooth out gas pedal movement
p.DMAX = 20 # slow down change of gas pedal
# Controlled Variable
v = m.CV(value=0)
v.STATUS = 1 # add the SP to the objective
m.options.CV_TYPE = 2 # squared error
v.SP = 40 # set point
v.TR_INIT = 1 # set point trajectory
v.TAU = 5 # time constant of trajectory
# Process model
m.Equation(mass*v.dt() == -v*b + K*b*p)
m.options.IMODE = 6 # control
m.solve(disp=False,GUI=True)
# get additional solution information
import json
with open(m.path+'//results.json') as f:
results = json.load(f)
plt.figure()
plt.subplot(2,1,1)
plt.plot(m.time,p.value,'b-',label='MV Optimized')
plt.legend()
plt.ylabel('Input')
plt.subplot(2,1,2)
plt.plot(m.time,results['v1.tr'],'k-',label='Reference Trajectory')
plt.plot(m.time,v.value,'r--',label='CV Response')
plt.ylabel('Output')
plt.xlabel('Time')
plt.legend(loc='best')
plt.show()

Is there a way to plot confusion matrix from H2O?

I know H2O can use
model_perf = model.model_performance(input)
model_perf.confusion_matrix
to output the confusion matrix. But is there a way to get the confusion matrix table to create plot?
You have the function you need as indicated here. So you just need to convert the output of your H2OFrames to a Pandas Dataframe. Example is shown below:
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.utils.multiclass import unique_labels
%matplotlib inline
h2o.init()
h2o.cluster().show_status()
# import the cars dataset:
# this dataset is used to classify whether or not a car is economical based on
# the car's displacement, power, weight, and acceleration, and the year it was made
cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
# print(cars["economy_20mpg"].isna().sum())
cars[~cars["economy_20mpg"].isna()]["economy_20mpg"].isna().sum()
cars = cars[~cars["economy_20mpg"].isna()]
# convert response column to a factor
cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
# set the predictor names and the response column name
predictors = ["displacement","power","weight","acceleration","year"]
response = "economy_20mpg"
# split into train and validation sets
train, valid = cars.split_frame(ratios = [.8], seed = 1234)
# try using the `y` parameter:
# first initialize your estimator
cars_gbm = H2OGradientBoostingEstimator(seed = 1234, sample_rate=.5)
# then train your model, where you specify your 'x' predictors, your 'y' the response column
# training_frame and validation_frame
cars_gbm.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
function from sklearn:
def plot_confusion_matrix(y_true, y_pred, classes,
normalize=False,
title=None,
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if not title:
if normalize:
title = 'Normalized confusion matrix'
else:
title = 'Confusion matrix, without normalization'
# Compute confusion matrix
cm = confusion_matrix(y_true, y_pred)
# Only use the labels that appear in the data
classes = classes[unique_labels(y_true, y_pred)]
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
print(cm)
fig, ax = plt.subplots()
im = ax.imshow(cm, interpolation='nearest', cmap=cmap)
ax.figure.colorbar(im, ax=ax)
# We want to show all ticks...
ax.set(xticks=np.arange(cm.shape[1]),
yticks=np.arange(cm.shape[0]),
# ... and label them with the respective list entries
xticklabels=classes, yticklabels=classes,
title=title,
ylabel='True label',
xlabel='Predicted label')
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
# Loop over data dimensions and create text annotations.
fmt = '.2f' if normalize else 'd'
thresh = cm.max() / 2.
for i in range(cm.shape[0]):
for j in range(cm.shape[1]):
ax.text(j, i, format(cm[i, j], fmt),
ha="center", va="center",
color="white" if cm[i, j] > thresh else "black")
fig.tight_layout()
return ax
extract values
# specify the threshold you want to use to create integer labels
maxf1_threshold = cars_gbm.find_threshold_by_max_metric('f1')
# specify your tru and prediciton labels
y_true = cars["economy_20mpg"].as_data_frame()
y_pred = cars_gbm.predict(cars)
# convert prediction labels (original uncalibrated probabilities into integer labels)
y_pred = (y_pred['p1'] >= maxf1_threshold).ifelse(1,0)
y_pred = y_pred.as_data_frame()
y_pred.columns = ['p1']
y_true1 = y_true.economy_20mpg
y_pred1 = y_pred.p1
class_names = np.array(cars["economy_20mpg"].levels()[0])
# Plot non-normalized confusion matrix
plot_confusion_matrix(y_true1, y_pred1, classes=class_names,
title='Confusion matrix')
image result:
Please note that there is a bug in the H2O-3 confusion matrix that has been noted here

Using "past" values to define current values in GEKKO equation

I am writing the GEKKO equations to determine a vehicle's gear box ratio which depends on the vehicle's previous derivatives. Is there a way to set a variable to the time shifted value of another variable?
Ex:
v=0,[1,2,3,4,5]
shifted_v=[0,1,2,3,4]
where the square bracket is the horizon and v is a state variable defined by equations.
One of the easiest ways to shift data sets is to use the numpy.roll function.
import numpy as np
x = np.linspace(0,5,6)
y = np.roll(x,-1) # shift left
y[-1] = 6
z = np.roll(x,1) # shift right
z[0] = -1
print('x: ' + str(x))
print('y: ' + str(y))
print('z: ' + str(z))
You can apply this strategy using Gekko variables by using the .value property such as:
import numpy as np
from gekko import GEKKO
m = GEKKO()
m.time = np.linspace(0,5,6)
x = m.Param(value=m.time)
y = m.Param()
y.value = np.roll(x.value,-1)
y.value[-1] = 6
z = m.Param()
z.value = np.roll(x.value,1)
z.value[0] = -1
There is also a TIME_SHIFT feature in Gekko that automatically shifts values as if they were advancing in time. The TIME_SHIFT option controls how much the values are shifted with every solve. The time shift happens at the beginning of the solve. Here is a more complete example with a visualization of the result.
import numpy as np
from gekko import GEKKO
import matplotlib.pyplot as plt
m = GEKKO()
m.time = np.linspace(0,5,6)
x = m.Param(value=m.time)
y = m.Param()
y.value = np.roll(x.value,-1)
y.value[-1] = 6
z = m.Param()
z.value = np.roll(x.value,1)
z.value[0] = -1
s = m.Var()
m.Equation(s==x+y-z)
m.options.IMODE=4
m.solve()
plt.subplot(2,1,1)
plt.plot(m.time,x.value,label='x')
plt.plot(m.time,y.value,label='y')
plt.plot(m.time,z.value,label='z')
plt.legend()
# solve a second time
m.options.TIME_SHIFT = 1 # default is 1
m.solve()
plt.subplot(2,1,2)
plt.plot(m.time,x.value,label='x')
plt.plot(m.time,y.value,label='y')
plt.plot(m.time,z.value,label='z')
plt.legend()
plt.show()
From your question, it appears that you need to calculate the previous derivative of a variable. If you need to time shift a value during the calculation, not just in the initialization phase, then I would recommend a discrete state space model with a delay of 1 time step. The link provides an example of how to implement this with 4 steps of delay. You would want to modify the discrete state space matrices to have 1 step of delay between the derivative and gear-box ratio.

Resources