How to provide parameter input for interaction variable in H2OGradientBoostingEstimator? - h2o

I need to use the interaction variable feature of multiclass classification in H2OGradientBoostingEstimator in H2O in Python. I am not sure which parameter to use & how to use that. Can anyone please help me out with this?
Currently, I am using the below code -
pros_gbm = H2OGradientBoostingEstimator(nfolds=0,seed=1234, keep_cross_validation_predictions = False, ntrees=10, max_depth=3, learn_rate=0.01, distribution='multinomial')
hist_gbm = pros_gbm.train(x=predictors, y=target, training_frame=hf_train, validation_frame = hf_test,verbose=True)

GBM inherently creates interactions. You can extract information about feature interactions using the .feature_interaction() extractor method (for an H2O Model). More information is provided in the user guide and the Python docs.
If you want to explicitly add a new column that is the interaction between two numerics, you could create that manually by multiplying the two (or more) columns together to get a new interaction column.
For categorical interactions, there's also the the h2o.interaction() method in Python here to create interaction columns in the data (prior to sending it to the GBM or any algorithm).

Related

Is there any alternative of duckling?

Rasa v 0.15 change log states that "SpacyEntityExtractor supports same entity filtering as DucklingHTTPExtractor" but if sentence has ‘time’ entity then values identified by spacy entity extractor are just the text values(ex. 7pm) and not the real time values (ex 2019-05-03T07.00.00) as identified by duckling.
Duckling is indeed the recommended option for dates.
Is there any reason why you don't want to use Duckling?
You could also implement a custom NLU pipeline component, if you want to use a another library.

Reusing h2o model mojo or pojo file from python

As H2o models are only reusable with the same major version of h2o they were saved with, an alternative is to save the model as MOJO/POJO format. Is there a way these saved models can be reused/loaded from python code. Or is there any way to keep the model for further development when upgrading the H2O version??
If you want to use your model for scoring via python, you could use either h2o.mojo_predict_pandas or h2o.mojo_predict_csv. But otherwise if you want to load a binary model that you previously saved, you will need to have compatible versions.
Outside of H2O-3 you can look into pyjnius as Tom recommended: https://github.com/kivy/pyjnius
Another alternative is to use pysparkling, if you only need it for scoring:
from pysparkling.ml import H2OMOJOModel
# Load test data to predict
df = spark.read.parquet(test_data_path)
# Load mojo model
mojo = H2OMOJOModel.createFromMojo(mojo_path)
# Make predictions
predictions = mojo.transform(df)
# Show predictions with ground truth (y_true and y_pred)
predictions.select('your_target_column', 'prediction').show()

Is there a supported way to get list of features used by a H2O model during its training?

This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:
train an model with a subset of those features
query that model for the features actually used to build that model
build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
pass this newly constructed frame to H2OModel.predict() to get a prediction
I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?
model._model_json['output']['names']
The response variable appears to be the last item in this list.
In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.
If you want to know which feature columns the model used after you have built a model you can do the following in python:
my_training_frame = your_model.actual_params['training_frame']
which will return some frame id
and then you can do
col_used = h2o.get_frame(my_training_frame)
col_used
EDITED (after comment was posted)
To get the columns use:
col_used.columns
Also, a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.
you can also open the saved model file, the first line will list the version of H2O used to create it.
For a model saved as a mojo you can look at the model.ini file. It will list the version of H2O.

Should I use singletableview?

I'm learning about Django tables. I first wrote a basic example, here my view:
def people1(request):
table = PersonTable(Person.objects.filter(id=2))
RequestConfig(request).configure(table)
return render(request, 'people.html', {'table': table})
This way I've been able to easily display a table with a filter condition "filter(id=2))".
After that I found SingleTableView which is supposed to be an easier way to display database tables, as an example I wrote this view, which worked fine:
from django_tables2 import SingleTableView
class PersonList(SingleTableView):
template_name = 'ta07/comun.html'
model = Person
table_class = PersonTable
Questions are: how should I do to apply filters like in the first example? And is SingleTableView better than the basic way?
I'd say for now, you should only use it for the very basic use case. As soon as you need customizations from that, use your own.
Since filtering is a very common use case, I might consider adding that to the features of SingleTableView at some point. If you need it before that, feel free to open a pull request.

How to perform a NOT query with postgres_ext

Is it possible to perform a NOT type query with chained methods using postgres_ext?
rules = Rule.where.overlap(:tags => ["foo"])
Basically want the inverse of the above. Thanks!
In regular active record you can use .where.not as described in this article: https://robots.thoughtbot.com/activerecords-wherenot however looking through the source code of postgres_ext I don't know if it is defined in that library. You may be able to construct your query in a way that uses the native active record methods.

Resources