Can someone help with the definition of Periodic Loss in h2o's implementation of Generalized Low Rank Models (GLRM)
https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/glrm.html
Related
I'm trying to build a regression model with XGBoost regressor.
Right now inference takes approximately 0.025 seconds, but I'd like to speed things up.
Can someone explain to me what can influence the inference speed? For example, max_depth, number of trees, number of features... I don't know very much on this topic and I didn't find a satisfying answer on the internet. Thank you!
Using sentiment analysis API and want to know how the AI bias that gets in through the training set of data and other biases quantified. Any help would be appreciated.
There are several tools developed to deal with it:
Fair Learn https://fairlearn.github.io/
Interpretability Toolkit https://learn.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability
In Fair Learn you can see how biased a ML model is after it has been trained with the data set and choose a maybe less accurate model which performs better with biases. The explainable ML models provide different correlation of inputs with outputs and combined with Fair Learn can give an idea of the health of the ML model.
I am new to H2O, I installed H2O Driverless AI in evaluation license. I can successfully perform visualisation and classification model prediction. But I'm wondering how to start with clustering. Because I don't find any option for unsupervised learning or clustering technique? where should i perform clustering operation in driverless AI? Is Clustering operation available in Driverless AI or not?
Thanks in Advance.
Currently, as of version 1.2.0, unsupervised clustering is not supported in DAI; DAI is designed to solve supervised learning problems.
Here are the current supported problem types (please review the documentation to see changes of future releases at http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/release_notes.html):
Problem types supported:
Regression (continuous target variable, for age, income, house price,
loss prediction, time-series forecasting)
Binary classification (0/1
or “N”/”Y”, for fraud prediction, churn prediction, failure
prediction, etc.)
Multinomial classification (0/1/2/3 or
“A”/”B”/”C”/”D” for categorical target variables, for prediction of
membership type, next-action, product recommendation, etc.)
I am currently exploring PU learning. This is learning from positive and unlabeled data only. One of the publications [Zhang, 2009] asserts that it is possible to learn by modifying the loss function of an algorithm of a binary classifier with probabilistic output (for example Logistic Regression). Paper states that one should optimize Balanced Accuracy.
Vowpal Wabbit currently supports five loss functions [listed here]. I would like to add a custom loss function where I optimize for AUC (ROC), or equivalently, following the paper: 1 - Balanced_Accuracy.
I am unsure where to start. Looking at the code reveals that I need to provide 1st, 2nd derivatives and some other info. I could also run the standard algorithm with Logistic loss but trying to adjust l1 and l2 according to my objective (not sure if this is good). I would be glad to get any pointers or advices on how to proceed.
UPDATE
More search revealed that it is impossible/difficult to optimize for AUC in online learning: answer
I found two software suites that are immediately ready to do PU learning:
(1) SVM perf from Joachims
Use the ``-l 10'' option here!
(2) Sofia-ml
Use ``--loop_type roc'' option here!
In general you set +1'' labels to your positive examples and-1'' to all unlabeled ones. Then you launch the training procedure followed by prediction.
Both softwares give you some performance metrics. I would suggest to use standardized and well established binary from KDD`04 cup: ``perf''. Get it here.
Hope it helps for those wondering how this works in practice. Perhaps I prevented the case XKCD
I had been using Mahout's 0.9 Naive Bayes algorithm to classify document data. For a specific train(2/3 of data) and test (1/3 of data) set, I was getting accuracy in the range of 86%. When I shifted to Spark's MLlib, the accuracy dropped to 82%. In both case using Standard Analyzer.
MlLib Link: https://spark.apache.org/docs/latest/mllib-naive-bayes.html
Mahout link: http://mahout.apache.org/users/classification/bayesian.html
Please help me in this regard as I have to use Spark in a production system very soon and this is a blocker for me.
I found a problem also MlLib take more time in data classification compare to Mahout.
And can any one help for me increase accuracy using MlLib naive Bayes.