Boosting Hyperparameter for LightGBM: does 'goss' use 'gbdt' as underlying model? - lightgbm

For the "boosting" parameter in lightgbm: https://lightgbm.readthedocs.io/en/latest/Parameters.html, is 'gbdt' used as the underlying model for 'goss'?

GOSS is a weighted sampling version of GBDT.
please refer to the GOSS algorithm in LightGBM Paper: http://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradi

Related

BayesSearchCV of LGBMregressor: how to weight samples in both training and CV scoring?

While optimizing LightGBM hyperparameters, I'd like to individually weight samples during both training and CV scoring. From the BayesSearchCV docs, it seems that a way to do that could be to insert a LGBMregressor sample_weight key into the BayesSearchCV fit_params option. But this is not clear because both BayesSearchCV and LGBMregressor have fit methods.
To which fit method is the BayesSearchCV fit_params going? And is using fit_params really the way to weight samples during both training and CV scoring?
Based on the documentation I believe fit_params is passed as an argument upon BayesSearchCV() instantiation, not when the .fit() method is called.

Threshold for spark XGBoost Classification model

How do I set an optimal threshold for an XGBoost classifier ? The default value used in the algorithm is 0.5. I wanted to know if there is any feature/in-built function I can use to change this.
If using python: You are looking for predict_proba() python API instead of usual predict() API. With predict_proba() you get probability which then can be mapped to any class depending on threshold value.
Since you mentioned spark mllib so you might be using scala or java with xgboost4j. In such cases also options exist; for example https://xgboost.readthedocs.io/en/latest/jvm/scaladocs/xgboost4j/ml/dmlc/xgboost4j/scala/Booster.html#predict(data:ml.dmlc.xgboost4j.scala.DMatrix,outPutMargin:Boolean,treeLimit:Int):Array[Array[Float]] you are looking for outPutMargin
For deciding threshold you can use ROC curve or evaluate you business outcome with xgboost outcome e.g. if all cases below score 0.8 are are loss making then you can set threshold to 0.8

How dealing with categorical variables in Calibrated Classifier?

I am dealing with calibration curve for catboost model.
cat=CatBoostClassifier()
calib=CalibratedClassifierCV(base_estimator=cat, method='sigmoid', cv=2)
calib.fit(XX,yy,cat_features=??)
How can I deal with categorical variables in the fit of calibrated classifier?
Thanks :)
You need to pass the categorical indices during the model constructor.
in your case:
cat=CatBoostClassifier(cat_features=categorical_positions)
and then continue as you wrote.
categorical_positions is a list if categorical features indices.
The problem is that sklearn CalibratedClassifierCV doesn't support string values.
To overcome this problem you need to change string values of categorical features to integer values (for example enumerate them). CatBoost will still treat them as categorical, because you have mentioned it in cat_features parameter of CatBoostClassifier, so metrics will be the same.

Cross-validating H2OStackedEnsembleEstimator?

H2O docs claim that "for all algos that support the nfolds parameter" cross-validation is done by the train method.
However, H2OStackedEnsembleEstimator does not:
H2OValueError: Unknown parameter nfolds = 5
So, how do I cross-validate such a model?
The name of the CV parameter for Stacked Ensemble is called metalearner_nfolds instead of nfolds. This is to emphasize that the cross-validation is for the metalearning algorithm. The list of parameters for Stacked Ensemble can be found here.

Optimization with only non-linear objective and all linear constraints

I am using Lindo API to solve a non-linear optimization scenario with non-linearity in only the objective. I am loading the constraint coefficients using LSloadLPData and calculating the value of objective using the CallBack function set via LSsetFuncalc. Is it necessary to call LSloadNLPData? If yes, what should the values be for indexes of non-linear variables in each column? (since all constraints are linear)
You should call LSloadNLPData to load indices of nonlinear variables in the objective function. In your LINDO API installation folder, see the following sample under installation folder.
lindoapi\samples\c\ex_nlp9_uc

Resources