Handle missing values with Stats Models Local Linear Trend model - statsmodels

I am using statsmodels to fit a Local Linear Trend state space model which inherits from the sm.tsa.statespace.MLEModel class using the code from the example in the documentation:
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_local_linear_trend.html
The state space model and Kalman filter should handle missing values naturally but when I add some null values the state space model outputs nulls. In another example in the docs, implementing SARIMAX it appears that missing data appears to be handled automatically:
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_internet.html
Is there a way to handle missing values in the same way for a Local Linear Trend model?

Chad Fulton replied to the issue I raised on github:
https://github.com/statsmodels/statsmodels/issues/7684
The statespace models can indeed handle NaN values in the endog variable. I think the issue is that in this example code, the starting parameters are computed as:
#property
def start_params(self):
return [np.std(self.endog)]*3
To handle NaN values in the data, you'd want to replace this with:
#property
def start_params(self):
return [np.nanstd(self.endog)]*3
This worked.

Related

Possible to set file name for h2o.save_model() (rather then simply use the model_id value)?

Trying to save an h2o model with some specific name that differs from the model's model_id field, but trying something like...
h2o.save_model(model=model,
path='/some/path/then/filename',
force=False)
just creates a dir/file structure like
some
|__path
|__then
|__filename
|__<model_id>
as opposed to
some
|__path
|__then
|__filename
Is this possible to do from the save_model method?
I can't / hesitate to simply change the model_id before calling the save method because the model names have timestamps appended to them to avoid name collisions with other models that may be on the h2o cluster (am trying to remove these timestamps when saving on disk and simplifying the name on the cluster before saving creates a time where naming collision can occur if other processes are also attempting to save such a model (of, say, a different timestamp)).
Any way to get this behavior or other common alternatives / workarounds?
This is currently not possible, however I created a feature request here. There is a related question here which shows a solution for R (could be adapted to Python). The work-around is just to rename the file manually using a few lines of R/Python code.

TFLearn - Metrics for validate and evaluate?

I'm using AlexNet model in TFLearn and there is a method to define the regression layer, which is:
tflearn.layers.estimator.regression (incoming, placeholder='default', optimizer='adam', loss='categorical_crossentropy', metric='default', learning_rate=0.001, dtype=tf.float32, batch_size=64, shuffle_batches=True, to_one_hot=False, n_classes=None, trainable_vars=None, restore=True, op_name=None, validation_monitors=None, validation_batch_size=None, name=None)
and it states that "A metric can also be provided, to evaluate the model performance.". So I'm wondering when is this metric also used for validation or only used evaluation? If it's not used in validation then based on what metric does the validation work?
EDIT 1: I found out that the metric declared in regression() method is actually used for validating as well. The default metric is Accuracy. However one thing I don’t understand is that when I don't use validation_set (or set it to None), the summary while training still outputs the acc value. So how is this accuracy value computed?
EDIT 2: Found the answer here: https://github.com/tflearn/tflearn/issues/357
The training accuracy acc is based on your training data, while the validation accuracy val_acc is based on the validation data. So omitting validation data won't change the output.

Solr Spatial: Is it possible to filter by one geolocation field and sort by a different one?

We would like to perform a spatial search on one geo field but distance sort the results based on a second geo field. It seems that Solr supports this for the LatLonType. Here we simply add parameters to the geodist function.
The geodist(param1,param2,param3) function supports (optional) parameters:
param1: the sfield
param2: the latitude (pt)
param3: the longitude (pt)
Unfortunately, this doesn't seem to work with the SpatialRecursivePrefixTreeFieldType. However, we have to use SpatialRecursivePrefixTreeFieldType since we have several locations for each document and this is not supported for the LatLonType. Is there any solution other than writing our own field type?
Finally I figured it out. However, the solution is a bit of a hack. I've now created a plugin jar that contains a modified version of the GeoDistValueSourceParser class. Within this class I've modified the method parseSfield to simply use a constant sfield, which should be used for sorting. Then I've hooked the class up by adding the line
<valueSourceParser name="customdist" class="bla.search.function.distance.CustomGeoDistValueSourceParser"/>
to the solrconfig.xml. So far I don't understand why the GeoDistValueSourceParser isn't configurable? It shouldn't be too difficult to write it in a way that a different geo field can be specified for sorting.

jVectorMap define data series on the fly

What I am trying to do and is failing is this:
worldMap.series.regions[0]=new jvm.DataSeries({
scale:['#CCCCCC','#FF0000'],
normalizeFunction:'polynomial',
values:{'country_code':value...},
min:minValue,
max:maxValue
});
The error I get is that regions is not defined, so I'm doing that wrong.
What is the proper way of doing that and how can I dispose of data if I don't need it anymore (as in remove colouring of countries as if map was initialized with empty dataset).
Thank you.
If all you want to do is to change series data then you can use DataSeries methods like clear and setValues:
worldMap.series.regions[0].clear();

Using invariant with Dexterity form and fieldsets

I have a content type derived from plone.directives.form.Schema; it has several dozen fields across four fieldsets. I'm trying to create a zope.interface.invariant that looks at fields from two different fieldsets.
From tracing the behaviour, it looks like the invariant is called once for each fieldset, but not for the entire form.
I'm aware I can provide my own handler and perform all the checks I need there, but that feels chunky compared to distinctly defined invariants. While the obvious solution is to move related fields onto the same fieldset, the current setup reflects a layout that is logical the end user.
Is there an existing hook where I could perform validation on multiple fields across fieldsets?
The answer seems to be no: z3c.form.group.Group.extractData calls z3c.form.form.BaseForm.extractData once for each group/fieldset, and this call already includes invariant validation.
Instead of registering your own handler, you could also overwrite extractData:
from plone.directives import form, dexterity
from z3c.form.interfaces import ActionExecutionError,WidgetActionExecutionError
# ...
class EditForm(dexterity.EditForm):
grok.context(IMyEvent)
def extractData(self, setErrors=True):
data, errors = super(EditForm, self).extractData(setErrors)
if not None in(data['start'], data['end']):
if data['end'] < data['start']:
raise WidgetActionExecutionError('end', Invalid(_(u"End date should not lie before the start date.")))
if data['end'] - data['start'] > datetime.timedelta(days=7):
raise WidgetActionExecutionError('end', Invalid(_(u"Duration of convention should be shorter than seven (7) days.")))
return data, errors
Please note that this class derives from dexterity.EditForm, which includes Dexterity's default handlers, instead of form.SchemaForm.
WidgetActionExecutionError does not work reliably, though. For some fields, it produces a 'KeyError'.

Resources