In the example regression, which is the target column? - lightgbm

I have run the example/regression here
https://github.com/microsoft/LightGBM/tree/master/examples/regression
One simple question is which colume in regression.train and regression.test is the feature and which column is used as target?

When using the LightGBM command-line interface (CLI), if you do not provide a value for the parameter label_column in train.conf, then the first column in the training data will be used as the target.
This can be seen in the source code for DatasetLoader, the class LightGBM uses to create a LightGBM Dataset object from a file.
label_idx_ ("column to use for the target") is initialized to 0: https://github.com/microsoft/LightGBM/blob/a8ee487aca35363fafa027e6b7695976045096b3/src/io/dataset_loader.cpp#L22
label_idx_ is only overridden if label_column is provided in the config: https://github.com/microsoft/LightGBM/blob/a8ee487aca35363fafa027e6b7695976045096b3/src/io/dataset_loader.cpp#L48

Related

How to provide parameter input for interaction variable in H2OGradientBoostingEstimator?

I need to use the interaction variable feature of multiclass classification in H2OGradientBoostingEstimator in H2O in Python. I am not sure which parameter to use & how to use that. Can anyone please help me out with this?
Currently, I am using the below code -
pros_gbm = H2OGradientBoostingEstimator(nfolds=0,seed=1234, keep_cross_validation_predictions = False, ntrees=10, max_depth=3, learn_rate=0.01, distribution='multinomial')
hist_gbm = pros_gbm.train(x=predictors, y=target, training_frame=hf_train, validation_frame = hf_test,verbose=True)
GBM inherently creates interactions. You can extract information about feature interactions using the .feature_interaction() extractor method (for an H2O Model). More information is provided in the user guide and the Python docs.
If you want to explicitly add a new column that is the interaction between two numerics, you could create that manually by multiplying the two (or more) columns together to get a new interaction column.
For categorical interactions, there's also the the h2o.interaction() method in Python here to create interaction columns in the data (prior to sending it to the GBM or any algorithm).

Reusing h2o model mojo or pojo file from python

As H2o models are only reusable with the same major version of h2o they were saved with, an alternative is to save the model as MOJO/POJO format. Is there a way these saved models can be reused/loaded from python code. Or is there any way to keep the model for further development when upgrading the H2O version??
If you want to use your model for scoring via python, you could use either h2o.mojo_predict_pandas or h2o.mojo_predict_csv. But otherwise if you want to load a binary model that you previously saved, you will need to have compatible versions.
Outside of H2O-3 you can look into pyjnius as Tom recommended: https://github.com/kivy/pyjnius
Another alternative is to use pysparkling, if you only need it for scoring:
from pysparkling.ml import H2OMOJOModel
# Load test data to predict
df = spark.read.parquet(test_data_path)
# Load mojo model
mojo = H2OMOJOModel.createFromMojo(mojo_path)
# Make predictions
predictions = mojo.transform(df)
# Show predictions with ground truth (y_true and y_pred)
predictions.select('your_target_column', 'prediction').show()

Is there a supported way to get list of features used by a H2O model during its training?

This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:
train an model with a subset of those features
query that model for the features actually used to build that model
build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
pass this newly constructed frame to H2OModel.predict() to get a prediction
I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?
model._model_json['output']['names']
The response variable appears to be the last item in this list.
In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.
If you want to know which feature columns the model used after you have built a model you can do the following in python:
my_training_frame = your_model.actual_params['training_frame']
which will return some frame id
and then you can do
col_used = h2o.get_frame(my_training_frame)
col_used
EDITED (after comment was posted)
To get the columns use:
col_used.columns
Also, a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.
you can also open the saved model file, the first line will list the version of H2O used to create it.
For a model saved as a mojo you can look at the model.ini file. It will list the version of H2O.

Yellow Interface as Source Table in ODI using SDK

How to assign a temporary target datastore of a pre-existing yellow interface as the source table while creating an interface using ODI SDK?
For a simple source table, the code would go as:
OdiDataStore SourceDS = ((IOdiDataStoreFinder)odiInstance.getTransactionalEntityManager().getFinder(OdiDataStore.class)).findByName(table_Name, model_Name);
I've tried getting the interface as an OdiInterface object and using getTargetDataStore() or getUnderlyingTable() on it, but it doesn't work.
Using the OdiInterface object or the Target Datastore both work whith an instance of InterfaceActionAddSourceDatastore.
The 3rd of the 4th constructors are the ones of interest for a temporary datastore as a source.
InterfaceActionAddSourceDataStore(OdiInterface.TargetDataStore pDataStore, DataSet pDataSet, IAliasComputer pAliasComputer, IClauseImporter pClauseImporter, IAutoMappingComputer pAutoMappingComputer)
InterfaceActionAddSourceDataStore(OdiInterface pInterface, DataSet pDataSet, IAliasComputer pAliasComputer, IClauseImporter pClauseImporter, IAutoMappingComputer pAutoMappingComputer)
The performAction method actually does the change.

Using Master Document for report generation in Enterprise Architect V11 - store configuration and filters

fellows!
I have a problem with report generation from my model using a Master Document defining a complex documentation. I believe some of you can help me solve the problem.
Facts:
I have a complex project consisting of several views and packages, inluding domain model, use case model, business process model, etc.
The model is stored in shared (database) repository along with other projects.
I have created custom templates, TOC, cover page and stylesheets for the documentation.
I have created a Master Document package with the main template assigned defining the main document I want to have generated.
I have created several Model Document elements in that package to define individual chapters of the document, assigning adequate templates and model packages to each of them.
I have successfully generated the desired documentation.
I am using Enterprise Architect version 11.0.1107
Problem 1:
I would like to have generated several variants of the same documentation. Thus, I need to change the settings of the generation process like the options, exclude filters and element filters.
However, the settings is not remembered after the generation and I have to set all the settings again when generating documentation on the Master Document package.
Is there a way to save the settings for the Master Document? I have found the Report Specification element, but it does not work as expected (see Problem 2).
Problem 2:
I have tried to use Report Specification element to save the settings for the report generation. I have created that element in the same package as the Master Document is located, and also inside of that Master Document package.
In both cases, when generating the documentation for the first time, EA asked me to select the package. I selected the Master Document package and confirmed the generation. However, the generated document is empty as it clearly does not take the Model Document elements in the selected package into account.
Did I use the Report Specification incorectly? Should I use another package for the Report Specification element? Should I select another package when using the Report Specification for report generation?
Problem 3:
I tried to apply element filters and some other options to include only some of the elements in the report. Let's say I want only element with the version 1.1. So I set the filters to "version = 1.1" when generating the report from the Master Document package.
However, the report contained all elements, regardless their version. The same happened when I tried to exclude anonymous elements. Furthermore, for the next try the filter settings was lost again and I had to set it again before next generation (see Problem 1).
Where should I configure the filters? Should it be set when generating using the Master Document package? Should it be set somewhere for the Model Document elements? Should it be set in the templates (thus making them very specific rather than general)? In such case, should it be set for the model template or for the individual fragments?
Summary:
If you have any tips for combination of Master Document and Report Specification, as well as using the element filters when generating from the Master Document, I would be very grateful.
We do'nt use Report Specification element, but we store specific options into Resource Document. Use [Resource Document] button on Generate Documentation dialog. This specification is stored between Resources (see Resources Window) under Document Generation -> Defined Documents.I hope this solves problem 1 and 2
Problem 3 - EA has more ways, how select elements for generation, unfortunatelly you can't combine them (as far as I know). I would try to define custom find filter and use it.

Resources