I'm following a Time Series Analysis module on DataCamp and tried using the exact same code in Jupyter notebook:
# Import the ARMA module from statsmodels
from statsmodels.tsa.arima_model import ARMA
# Forecast the first AR(1) model
mod = ARMA(simulated_data_1, order=(1,0))
res = mod.fit()
res.plot_predict(start=990, end=1010)
plt.show()
However, I keep getting the following error:
NotImplementedError:
statsmodels.tsa.arima_model.ARMA and statsmodels.tsa.arima_model.ARIMA have
been removed in favor of statsmodels.tsa.arima.model.ARIMA (note the .
between arima and model) and statsmodels.tsa.SARIMAX.
statsmodels.tsa.arima.model.ARIMA makes use of the statespace framework and
is both well tested and maintained. It also offers alternative specialized
parameter estimators.
After making some adjustments by using the correct package, I was able to get a chart but not the confidence interval as in the first image.
# Import the ARMA module from statsmodels
from statsmodels.tsa.arima.model import ARIMA
# Fit an AR(1) model to the first simulated data
model = ARIMA(simulated_data_1, order=(1,0,0))
result = model.fit()
plt.plot(simulated_data_1[990:])
plt.plot(result.predict(start=990, end=1010, ax=ax), color='red')
plt.show()
How can I add in the confidence interval?
Related
I am trying to find the best Alpha for a Ridge model without CV, using Yellowbrick ManualAlphaSelection API. My code is pretty basic and it has been taken from the yellowbrick´s documentation. Even though it does not work:
from yellowbrick.regressor import ManualAlphaSelection
from sklearn.linear_model import Ridge
model = ManualAlphaSelection(Ridge(), scoring='neg_mean_squared_error')
model.fit(X_train, y_train)
model.show()
Python raises the message: 'Ridge' is not a CV regularization model; try ManualAlphaSelection instead.
But this message is wrong because the ManualAlphaSelection is already being used.
This actually appears to be a bug in our library 😅
Would you mind opening up a bug report on GitHub so we can be sure to fix it? Thank you for checking out Yellowbrick!
From my reading of the LightGBM document, one is supposed to define categorical features in the Dataset method. So I have the following code:
cats=['C1', 'C2']
d_train = lgb.Dataset(X, label=y, categorical_feature=cats)
However, I received the following error message:
/app/anaconda3/anaconda3/lib/python3.7/site-packages/lightgbm/basic.py:1243: UserWarning: Using categorical_feature in Dataset.
warnings.warn('Using categorical_feature in Dataset.')
Why did I get the warning message?
I presume that you get this warning in a call to lgb.train. This function also has argument categorical_feature, and its default value is 'auto', which means taking categorical columns from pandas.DataFrame (documentation). The warning, which is emitted at this line, indicates that, despite lgb.train has requested that categorical features be identified automatically, LightGBM will use the features specified in the dataset instead.
To avoid the warning, you can give the same argument categorical_feature to both lgb.Dataset and lgb.train. Alternatively, you can construct the dataset with categorical_feature=None and only specify the categorical features in lgb.train.
Like user andrey-popov described you can use the lgb.train's categorical_feature parameter to get rid of this warning.
Below is a simple example with some code how you could do it:
# Define categorical features
cat_feats = ['item_id', 'dept_id', 'store_id',
'cat_id', 'state_id', 'event_name_1',
'event_type_1', 'event_name_2', 'event_type_2']
...
# Define the datasets with the categorical_feature parameter
train_data = lgb.Dataset(X.loc[train_idx],
Y.loc[train_idx],
categorical_feature=cat_feats,
free_raw_data=False)
valid_data = lgb.Dataset(X.loc[valid_idx],
Y.loc[valid_idx],
categorical_feature=cat_feats,
free_raw_data=False)
# And train using the categorical_feature parameter
lgb.train(lgb_params,
train_data,
valid_sets=[valid_data],
verbose_eval=20,
categorical_feature=cat_feats,
num_boost_round=1200)
This is less of an answer to the original OP and more of an answer to people who are using sklearn API and encounter this issue.
For those of you who are using sklearn API, especially using one of the cross_val methods from sklearn, there are two solutions you could consider using.
Sklearn API solution
A solution that worked for me was to cast categorical fields into the category datatype in pandas.
If you are using pandas df, LightGBM should automatically treat those as categorical. From the documentation:
integer codes will be extracted from pandas categoricals in the
Python-package
It would make sense for this to be the equivalent in the sklearn API to setting categoricals in the Dataset object.
But keep in mind that LightGBM does not officially support virtually any of the non-core parameters for sklearn API, and they say so explicitly:
**kwargs is not supported in sklearn, it may cause unexpected issues.
Adaptive Solution
The other, more sure-fire solution to being able to use methods like cross_val_predict and such is to just create your own wrapper class that implements the core Dataset/Train under the hood but exposes a fit/predict interface for the cv methods to latch onto. That way you get the full functionality of lightGBM with only a little bit of rolling your own code.
The below sketches out what this could look like.
class LGBMSKLWrapper:
def __init__(self, categorical_variables, params):
self.categorical_variables = categorical_variables
self.params = params
self.model = None
def fit(self, X, y):
my_dataset = ltb.Dataset(X, y, categorical_feature=self.categorical_variables)
self.model = ltb.train(params=self.params, train_set=my_dataset)
def predict(self, X):
return self.model.predict(X)
The above lets you load up your parameters when you create the object, and then passes that onto train when the client calls fit.
is it possible to plot a pyLDAvis with a Mallet implementation of LDA ? I have no troubles with LDA_Model but when I use Mallet I get :
'LdaMallet' object has no attribute 'inference'
My code :
pyLDAvis.enable_notebook()
vis = pyLDAvis.gensim.prepare(mallet_model, corpus, id2word)
vis
Run this line to convert the class of your mallet model into a LdaModel before pyLDAvis
[Edit]: edited code to use the inbuilt function in gensim instead. I just tried it but am unable to get meaningful results with the pyLDAvis on a converted mallet model; the topics seem to contain random terms.. Anybody encountered this before?
import gensim
model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(mallet_model)
Got this from the link below, do explore it, lines 565 - 590
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/wrappers/ldamallet.py#L359
I hope I have helped.
from gensim.models.ldamodel import LdaModel
def convertldaGenToldaMallet(mallet_model):
model_gensim = LdaModel(
id2word=mallet_model.id2word, num_topics=mallet_model.num_topics,
alpha=mallet_model.alpha, eta=0,
)
model_gensim.state.sstats[...] = mallet_model.wordtopics
model_gensim.sync_state()
return model_gensim
I found this blog post helpful, directly using the statefile produced by MALLET which is also produced using Gensim's Mallet wrapper.
While using Tensorflow v.1.0.1 and Keras 2.0 and running this code:
from keras import backend as K
if K.image_data_format() == 'channels_first':
input_shape = (1, img_width, img_height)
I'm getting the following error:
AttributeError: module 'keras.backend' has no attribute
'image_data_format'
How can I solve this?
It's because image_data_format is defined in keras.backend.common in keras 2.0.
If you have an earlier version, you could try checking for the value of dim_ordering in your config file (default is tensorflow ordering tf corresponding to channels last).
Two ways to solve this
Solution 1 (if you are using tensorflow.keras)
from tensorflow.keras import backend as K #instead of from keras import backend as K
Solution 2 (if you are using Keras directly)
from keras import backend as K
replace K.image_data_format() with K.common.image_dim_ordering
In keras latest version i.e. keras == 2.4.3, I resolved this issue using below code
from keras.backend import image_data_format
Is there a way to save and recover a trained Neural Network in PyBrain, so that I don't have to retrain it each time I run the script?
PyBrain's Neural Networks can be saved and loaded using either python's built in pickle/cPickle module, or by using PyBrain's XML NetworkWriter.
# Using pickle
from pybrain.tools.shortcuts import buildNetwork
import pickle
net = buildNetwork(2,4,1)
fileObject = open('filename', 'w')
pickle.dump(net, fileObject)
fileObject.close()
fileObject = open('filename','r')
net = pickle.load(fileObject)
Note cPickle is implemented in C, and therefore should be much faster than pickle. Usage should mostly be the same as pickle, so just import and use cPickle instead.
# Using NetworkWriter
from pybrain.tools.shortcuts import buildNetwork
from pybrain.tools.customxml.networkwriter import NetworkWriter
from pybrain.tools.customxml.networkreader import NetworkReader
net = buildNetwork(2,4,1)
NetworkWriter.writeToFile(net, 'filename.xml')
net = NetworkReader.readFrom('filename.xml')
The NetworkWriter and NetworkReader work great. I noticed that upon saving and loading via pickle, that the network is no longer changeable via training-functions. Thus, I would recommend using the NetworkWriter-method.
NetworkWriter is the way to go. Using Pickle you can't retrain network as Jorg tells.
You need something like this:
from pybrain.tools.shortcuts import buildNetwork
from pybrain.tools.customxml import NetworkWriter
from pybrain.tools.customxml import NetworkReader
net = buildNetwork(4,6,1)
NetworkWriter.writeToFile(net, 'filename.xml')
net = NetworkReader.readFrom('filename.xml')