Is there a supported way to get list of features used by a H2O model during its training? - h2o

This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:
train an model with a subset of those features
query that model for the features actually used to build that model
build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
pass this newly constructed frame to H2OModel.predict() to get a prediction
I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?
model._model_json['output']['names']
The response variable appears to be the last item in this list.
In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.

If you want to know which feature columns the model used after you have built a model you can do the following in python:
my_training_frame = your_model.actual_params['training_frame']
which will return some frame id
and then you can do
col_used = h2o.get_frame(my_training_frame)
col_used
EDITED (after comment was posted)
To get the columns use:
col_used.columns
Also, a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.
you can also open the saved model file, the first line will list the version of H2O used to create it.
For a model saved as a mojo you can look at the model.ini file. It will list the version of H2O.

Related

Updating metadata using Java and iCAFE

This answer explains how to update metadata.
How to manipulate image metadata in ICAFE
I wanted to update both the EXIF and the IPTC sections, retaining existing metadata values. The Metadata.insertExif allows updating EXIF using the update parameter but cannot be used with the Metadata.insertMetadata(metaList, fin,fout) where I have created a List in order to update multiple sections within the metadata (EXIF and IPTC) e.g. List metaList = new ArrayList<>();
I assume I can do it using two passes - once to update EXIF and once to update IPTC. Is there a way to do this in one pass i.e. retrieve existing EXIF and then update EXIF and IPTC?
This library is not a photo editing software in itself so it is hard to find a point where the user can achieve most of the goals they want and in the mean while don't get it two complicated in the implementation. This is how it looks for now.
That said, it is not possible to insert with update more than one metadata at the same time. You can definitely do this in two passes as you mentioned in your question. You can also do it a different way which makes manipulating existing metadata more flexible.
To do that, you can use Metadata.removeMetadata () to remove both Exif and IPTC. This method will keep the removed metadata as return value so you can later do whatever you want to them and then use Metadata.insertMetadata () to insert the new Exif and IPTC back to the intermediate file which has the two metadata removed (if any) in one shot.

Possible to set file name for h2o.save_model() (rather then simply use the model_id value)?

Trying to save an h2o model with some specific name that differs from the model's model_id field, but trying something like...
h2o.save_model(model=model,
path='/some/path/then/filename',
force=False)
just creates a dir/file structure like
some
|__path
|__then
|__filename
|__<model_id>
as opposed to
some
|__path
|__then
|__filename
Is this possible to do from the save_model method?
I can't / hesitate to simply change the model_id before calling the save method because the model names have timestamps appended to them to avoid name collisions with other models that may be on the h2o cluster (am trying to remove these timestamps when saving on disk and simplifying the name on the cluster before saving creates a time where naming collision can occur if other processes are also attempting to save such a model (of, say, a different timestamp)).
Any way to get this behavior or other common alternatives / workarounds?
This is currently not possible, however I created a feature request here. There is a related question here which shows a solution for R (could be adapted to Python). The work-around is just to rename the file manually using a few lines of R/Python code.

Reusing h2o model mojo or pojo file from python

As H2o models are only reusable with the same major version of h2o they were saved with, an alternative is to save the model as MOJO/POJO format. Is there a way these saved models can be reused/loaded from python code. Or is there any way to keep the model for further development when upgrading the H2O version??
If you want to use your model for scoring via python, you could use either h2o.mojo_predict_pandas or h2o.mojo_predict_csv. But otherwise if you want to load a binary model that you previously saved, you will need to have compatible versions.
Outside of H2O-3 you can look into pyjnius as Tom recommended: https://github.com/kivy/pyjnius
Another alternative is to use pysparkling, if you only need it for scoring:
from pysparkling.ml import H2OMOJOModel
# Load test data to predict
df = spark.read.parquet(test_data_path)
# Load mojo model
mojo = H2OMOJOModel.createFromMojo(mojo_path)
# Make predictions
predictions = mojo.transform(df)
# Show predictions with ground truth (y_true and y_pred)
predictions.select('your_target_column', 'prediction').show()

Using Master Document for report generation in Enterprise Architect V11 - store configuration and filters

fellows!
I have a problem with report generation from my model using a Master Document defining a complex documentation. I believe some of you can help me solve the problem.
Facts:
I have a complex project consisting of several views and packages, inluding domain model, use case model, business process model, etc.
The model is stored in shared (database) repository along with other projects.
I have created custom templates, TOC, cover page and stylesheets for the documentation.
I have created a Master Document package with the main template assigned defining the main document I want to have generated.
I have created several Model Document elements in that package to define individual chapters of the document, assigning adequate templates and model packages to each of them.
I have successfully generated the desired documentation.
I am using Enterprise Architect version 11.0.1107
Problem 1:
I would like to have generated several variants of the same documentation. Thus, I need to change the settings of the generation process like the options, exclude filters and element filters.
However, the settings is not remembered after the generation and I have to set all the settings again when generating documentation on the Master Document package.
Is there a way to save the settings for the Master Document? I have found the Report Specification element, but it does not work as expected (see Problem 2).
Problem 2:
I have tried to use Report Specification element to save the settings for the report generation. I have created that element in the same package as the Master Document is located, and also inside of that Master Document package.
In both cases, when generating the documentation for the first time, EA asked me to select the package. I selected the Master Document package and confirmed the generation. However, the generated document is empty as it clearly does not take the Model Document elements in the selected package into account.
Did I use the Report Specification incorectly? Should I use another package for the Report Specification element? Should I select another package when using the Report Specification for report generation?
Problem 3:
I tried to apply element filters and some other options to include only some of the elements in the report. Let's say I want only element with the version 1.1. So I set the filters to "version = 1.1" when generating the report from the Master Document package.
However, the report contained all elements, regardless their version. The same happened when I tried to exclude anonymous elements. Furthermore, for the next try the filter settings was lost again and I had to set it again before next generation (see Problem 1).
Where should I configure the filters? Should it be set when generating using the Master Document package? Should it be set somewhere for the Model Document elements? Should it be set in the templates (thus making them very specific rather than general)? In such case, should it be set for the model template or for the individual fragments?
Summary:
If you have any tips for combination of Master Document and Report Specification, as well as using the element filters when generating from the Master Document, I would be very grateful.
We do'nt use Report Specification element, but we store specific options into Resource Document. Use [Resource Document] button on Generate Documentation dialog. This specification is stored between Resources (see Resources Window) under Document Generation -> Defined Documents.I hope this solves problem 1 and 2
Problem 3 - EA has more ways, how select elements for generation, unfortunatelly you can't combine them (as far as I know). I would try to define custom find filter and use it.

How to add components in to an existing GUI created by guide?

I just created a GUI using guide in MATLAB for a small project I'm working on. I have amongst other things two text fields for from and to dates. Now I'd like to get rid of them and use a Java date select tool. Of course this is not possible using guide so I need to add them manually.
I've managed to get them to show up by putting this code into my Opening_Fcn,
uicomponent(handles, 'style','com.jidesoft.combobox.DateChooserPanel','tag','til2');
using UICOMPONENT.
But even though it shows up I can't access the date select's attributes, for example
get(handles.til2)
returns
??? Reference to non-existent field 'til2'.
How can I fix this?
Unless you edit the saved GUI figure, the basic handles structure will not include your new component by default.
One way to access you component is to store the handle via guidata, by adding the following to your opening function:
handles.til2 = uicomponent(handles, 'style','com.jidesoft.combobox.DateChooserPanel','tag','til2');
guidata(hObject,handles)
Functions that need to access the handle need the line
handles = guidata(hObject)
to return the full handles structure that includes the filed til2

Resources