I'm using AlexNet model in TFLearn and there is a method to define the regression layer, which is:
tflearn.layers.estimator.regression (incoming, placeholder='default', optimizer='adam', loss='categorical_crossentropy', metric='default', learning_rate=0.001, dtype=tf.float32, batch_size=64, shuffle_batches=True, to_one_hot=False, n_classes=None, trainable_vars=None, restore=True, op_name=None, validation_monitors=None, validation_batch_size=None, name=None)
and it states that "A metric can also be provided, to evaluate the model performance.". So I'm wondering when is this metric also used for validation or only used evaluation? If it's not used in validation then based on what metric does the validation work?
EDIT 1: I found out that the metric declared in regression() method is actually used for validating as well. The default metric is Accuracy. However one thing I don’t understand is that when I don't use validation_set (or set it to None), the summary while training still outputs the acc value. So how is this accuracy value computed?
EDIT 2: Found the answer here: https://github.com/tflearn/tflearn/issues/357
The training accuracy acc is based on your training data, while the validation accuracy val_acc is based on the validation data. So omitting validation data won't change the output.
Related
update
Can I split the small test set into a validation set realB-v and a test set realB-t, then I fine-tune the model and test on the test set realB-v. Then, I swap the validation set and the test set and train a new model. Can I report the average results on two trainings?
original post
I have a pre-trained model M trained on the real dataset realA, I test it on another real dataset realB and get very poor results because realA and realB have domain gaps. Since real images in realB are difficult to acquire, I decide to generate synthetic images like realB and use these images syntheticA to fine-tune the model M.
I wonder if I still need to get a validation set? If so, the validation set should be splitted from syntheticA or realB? realB is already a very small set (300 images).
In my view, I don't think a validation set in this case is necessary. If I directly fine-tune the model and get hyperparameters according to the accuracy rate on realB, it won't cause generalization problems because the images I use for fine-tuning are all synthetic.
I'd like to hear your views. Thank you.
I am using statsmodels to fit a Local Linear Trend state space model which inherits from the sm.tsa.statespace.MLEModel class using the code from the example in the documentation:
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_local_linear_trend.html
The state space model and Kalman filter should handle missing values naturally but when I add some null values the state space model outputs nulls. In another example in the docs, implementing SARIMAX it appears that missing data appears to be handled automatically:
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_sarimax_internet.html
Is there a way to handle missing values in the same way for a Local Linear Trend model?
Chad Fulton replied to the issue I raised on github:
https://github.com/statsmodels/statsmodels/issues/7684
The statespace models can indeed handle NaN values in the endog variable. I think the issue is that in this example code, the starting parameters are computed as:
#property
def start_params(self):
return [np.std(self.endog)]*3
To handle NaN values in the data, you'd want to replace this with:
#property
def start_params(self):
return [np.nanstd(self.endog)]*3
This worked.
I need to use the interaction variable feature of multiclass classification in H2OGradientBoostingEstimator in H2O in Python. I am not sure which parameter to use & how to use that. Can anyone please help me out with this?
Currently, I am using the below code -
pros_gbm = H2OGradientBoostingEstimator(nfolds=0,seed=1234, keep_cross_validation_predictions = False, ntrees=10, max_depth=3, learn_rate=0.01, distribution='multinomial')
hist_gbm = pros_gbm.train(x=predictors, y=target, training_frame=hf_train, validation_frame = hf_test,verbose=True)
GBM inherently creates interactions. You can extract information about feature interactions using the .feature_interaction() extractor method (for an H2O Model). More information is provided in the user guide and the Python docs.
If you want to explicitly add a new column that is the interaction between two numerics, you could create that manually by multiplying the two (or more) columns together to get a new interaction column.
For categorical interactions, there's also the the h2o.interaction() method in Python here to create interaction columns in the data (prior to sending it to the GBM or any algorithm).
This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:
train an model with a subset of those features
query that model for the features actually used to build that model
build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
pass this newly constructed frame to H2OModel.predict() to get a prediction
I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?
model._model_json['output']['names']
The response variable appears to be the last item in this list.
In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.
If you want to know which feature columns the model used after you have built a model you can do the following in python:
my_training_frame = your_model.actual_params['training_frame']
which will return some frame id
and then you can do
col_used = h2o.get_frame(my_training_frame)
col_used
EDITED (after comment was posted)
To get the columns use:
col_used.columns
Also, a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.
you can also open the saved model file, the first line will list the version of H2O used to create it.
For a model saved as a mojo you can look at the model.ini file. It will list the version of H2O.
I have a content type derived from plone.directives.form.Schema; it has several dozen fields across four fieldsets. I'm trying to create a zope.interface.invariant that looks at fields from two different fieldsets.
From tracing the behaviour, it looks like the invariant is called once for each fieldset, but not for the entire form.
I'm aware I can provide my own handler and perform all the checks I need there, but that feels chunky compared to distinctly defined invariants. While the obvious solution is to move related fields onto the same fieldset, the current setup reflects a layout that is logical the end user.
Is there an existing hook where I could perform validation on multiple fields across fieldsets?
The answer seems to be no: z3c.form.group.Group.extractData calls z3c.form.form.BaseForm.extractData once for each group/fieldset, and this call already includes invariant validation.
Instead of registering your own handler, you could also overwrite extractData:
from plone.directives import form, dexterity
from z3c.form.interfaces import ActionExecutionError,WidgetActionExecutionError
# ...
class EditForm(dexterity.EditForm):
grok.context(IMyEvent)
def extractData(self, setErrors=True):
data, errors = super(EditForm, self).extractData(setErrors)
if not None in(data['start'], data['end']):
if data['end'] < data['start']:
raise WidgetActionExecutionError('end', Invalid(_(u"End date should not lie before the start date.")))
if data['end'] - data['start'] > datetime.timedelta(days=7):
raise WidgetActionExecutionError('end', Invalid(_(u"Duration of convention should be shorter than seven (7) days.")))
return data, errors
Please note that this class derives from dexterity.EditForm, which includes Dexterity's default handlers, instead of form.SchemaForm.
WidgetActionExecutionError does not work reliably, though. For some fields, it produces a 'KeyError'.