I want to run kneighborregressor and kneighborclassifier on GPU. However, I cannot find out them in RAPIDS. Are these functions currently implemented or planned to be developed?
The underlying code is there for KNN in RAPIDS 0.10, but it does not yet have the Regressor and Classifier interfaces. You can use cuml KNN to find the nearest neighbors themselves, though. In 0.11, we will support these interfaces like in sklearn.
Related
I am buidling my first time-series prediction model with scikit-learn's LinearRegression(). I also came across statsmodels AutoReg(), ARMA() and SARIMAX(). Unfortunately out of the literature I could not figure out to consider them. Are they alternatives to LinearRegression()? Are they ML? Are they fundamental different?
I'd appreciate a hint, where to look further. Thanks.
All three fit variants of Seasonal Autoregressive Integrated Moving Average with eXogenous Variables (SARIMAX) models.
AutoReg
AutoReg is limited to only Autoregressive Models and so does not include Seasonal or Moving Average components. It does support exogenous regressors. It also supports complex deterministic processes such as Fourier series to model multiple seasonalities. Parameters are estimated using OLS which is equivalent to conditional maximum likelihood. Since parameters are estimated using OLS, estimation is very fast and completely deterministic.
ARIMA
ARIMA is a restricted version of SARIMAX that does not include Seasonal components or Exogenous regressors. Because it excludes these two types of terms, it can offer additional fitting options that are not available when fitting a full SARIMAX model. These have different statistical properties than the Maximum Likelihood method that is the only method available in SARIMAX (ARIMA also supports Maximum Likelihood). Many of these alternative parameter estimation methods are also faster than ML.
SARIMAX
SARIMAX supports all features of ARIMA plus the two additional components. It can only be estimated using Maximum Likelihood. ML uses numerical methods to maximize the function and so estimation of some series/models may encounter difficulties converging.
The examples page is the best place to look to see the detailed use of these models. Many of the notebooks include both code examples and LaTeX markup that explains the underlying math.
I am very beginner in h2o and I want to know if there is any attribute selection capabilities in h2o framework so to be applied in h2oframes?
No there are not currently feature selection functions in H2O -- my advice would be to use Lasso regression (in H2O this means use GLM with alpha = 1.0) to do the feature selection, or simply allow whatever machine learning algorithm (e.g. GBM) you are planning to use to use all the features (they'll tend to ignore the bad ones, but it could still degrade performance of the algorithm to have bad features in the training data).
If you'd like, you can make a feature request by filling out a ticket on the H2O-3 JIRA. This seems like a nice feature to have.
In my opinion, Yes
My way is use automl to train your data.
after training, you can get a lot of model.
use h2o.get_model method or H2O server page to watch some model you like.
you can get VARIABLE IMPORTANCES frame.
then pick your features.
I'm comuputing HiCO and HiSC clustering algorithms on my dataset. If I'm not mistaken, the algorithms use different approach to define relevant subspaces for clusters in the 1st step and in the 2nd they apply OPTICS for clustering. I'm getting only cluster order file after I run the algorithms.
Is there any way to extract clusters from it? for example like OPTICSXi? (I know there are 3 extraction methods under hierarchical clustering but I can't see anything for HiCO or HiSC)
Thank you in advance for any hints
Use OPTICSXi as algorithm, then use HiCO or HiSC "inside".
The Xi extraction can be parameterized to use a different OPTICS variant like HiCO, HiSC, and DeLi-Clu. It just defaults to using regular OPTICS.
-algorithm clustering.optics.OPTICSXi
-opticsxi.algorithm de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.HiCO
respectively
-algorithm clustering.optics.OPTICSXi
-opticsxi.algorithm de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.HiSC
We currently don't have implementations of the other extraction methods in ELKI yet, sorry.
I'm having difficulties using the ELKI MiniGUI to run spatial outlier detection algorithms. Many of the algorithms require a list of KNN for each object in the database. It appears that a KNN label list first needs to be created from the spatial coordinate database only, not including the attributes. Then, I suppose the spatial outlier detection algorithms are run on the attribute database along with the external file of the spatial KNN.
My Java experience is limited, so I would like to use ELKI in the command line and use the MiniGUI to assemble code for each task. However, with the MiniGUI I have only been able to create, or materialize, external files for 1) the triangular distance matrix and 2) the KNN Distance Order, which seems to include the object itself as one of the KNN. It seems that I really need an external file, or cached data, of a list of each object and their spatial neighbors. Maybe a KNN Query, KNN Join, precomputed distances or preprocessed database filter would be helpful, but I really don't know.
What steps are needed to create and use files, or cached data, that are required to supply the KNN spatial relation for the spatial outlier detection attribute relation of each object to its neighbors? I am unclear of how to do this with the MiniGUI, especially since it looks like the spatial neighborhood relation needs to be created first, before using it with the spatial outlier detection algorithm and the attribute database.
Any advice is greatly appreciated.
Thanks!
Thank you for contributing a how-to to the ELKI wiki!
How to perform geo-spatial outlier detection with an external neighborhood specification
it is a nice step-by-step introduction to using ELKI, and I hope others will find it useful.
Posting as "answer" here, so that others have it easy to find.
I have written a program to compute SURF features and then use FLANN(Fast Library for Nearest Neighbour) to match and show the nearest neighbours. Now can the usage of FLANN be considered as using machine learning as it is my understanding that it is a an approximate version of k- nearest neighbour search which is considered as machine learning algorithm(supervised learning).
You will find mention of methods like FLANN, LSH, Spectral Hashing, and KD-tree (variants) in a lot of machine learning publications.
However, as you said, these methods themselves are not learners/classifiers, but they may often be used within typical machine learning applications. Per your example, FLANN is not a supervised classifier, but it can be used to significantly improve taggers and recommenders.
(That said, this question may be more appropriate for CrossValidated or the proposed Machine Learning forum.)
FLANN is just an approximate nearest neighbor search stucture; that's not machine learning.
But your K-nearest-neighbor classifier that uses FLANN is machine learning.