Typical scoring parameter choices for cross-validation of ranking classifier rank:pairwise - cross-validation

I am building XGBoost ranking classifier using Python xgboost.sklearn.XGBClassifier (XGBClassifier). In my problem, I try to classify ranking labels that vary in 0,1,2,3. In the classifier setup, I used objective = "rank:pairwise". I now want to run cross-validation with sklearn.model_selection.cross_val_score (cross_val_score).
Are there any canonical choices of scoring function to assess the rank outcome classification performance?
I am thinking scoring = "neg_mean_squared_error" seems like an OK choice as it weights the distance between the two labels, i.e. accounts for the ranking character of the outcome.
I hope to get other comments/opinions/experiences on that.

Related

Multiclass classification confidence score using multiclass classification using predict_proba of SGDclassifier

I am using Logistic regression in SGDClassifier to perform multi-class classification of ~10k category.
To get confidence score for predicted result I am using predict_proba function.
But I am getting prediction probability value of 0.00026091,0.00049697,0.00019632 for both correct and wrong prediction.
Please suggest the way to normalize the score so that I can consider result by filtering the probability value
If the probability values of all classes are very low, it might mean, that your classifier has a hard time to classify the samples. You might want to do some feature engineering or try another model.
To normalize the values, have a look at scikit-learn MinMaxScaler. This will scale the data to numbers between 0 and 1. But as I said, if the probability for all values is very low, you wont get a good classification result.
Hope that helps

How does a bagging classifier(average) works?

how does a bagging classifier works(averaging, not voting)?. I am working on bagging classifier, I want to use a average of the models but when I bag models, the result is a continuous value rather than a categorical value. Can I use averaging here? If yes, How?
You have to give more details on what programming language and library you are using,
If you are doing regression the bagging model can give you the average or a weighted average.
If you are doing classification then it can be voting or weighted voting.
However, if you are doing binary classification then the average of 1s and 0s can be used to give you some pseudo probability or confidence for the prediction.
you can do this for non-binary classification using the one vs all method to get probabilities for all possible classes.

How to implement decision trees in boosting

I'm implementing AdaBoost(Boosting) that will use CART and C4.5. I read about AdaBoost, but i can't find good explenation how to join AdaBoost with Decision Trees. Let say i have data set D that have n examples. I split D to TR training examples and TE testing examples.
Let say TR.count = m,
so i set weights that should be 1/m, then i use TR to build tree, i test it with TR to get wrong examples, and test with TE to calculate error. Then i change weights, and now how i will get next Training Set? What kind of sampling should i use (with or without replacemnet)? I know that new Training Set should focus more on samples that were wrong classified but how can i achieve this? Well how CART or C4.5 will know that they should focus on examples with greater weight?
As I know, the TE data sets don't mean to be used to estimate the error rate. The raw data can be split into two parts (one for training, the other for cross validation). Mainly, we have two methods to apply weights on the training data sets distribution. Which method to use is determined by the weak learner you choose.
How to apply the weights?
Re-sample the training data sets without replacement. This method can be viewed as weighted boosting method. The generated re-sampling data sets contain miss-classification instances with higher probability than the correctly classified ones, therefore it force the weak learning algorithm to concentrate on the miss-classified data.
Directly use the weights when learning. Those models include Bayesian Classification, Decision Tree (C4.5 and CART) and so on. With respect to C4.5, we calculate the the gain information (mutation information) to determinate which predictor will be selected as the next node. Hence we can combine the weights and entropy to estimate the measurements. For example, we view the weights as the probability of the sample in the distribution. Given X = [1,2,3,3], weights [3/8,1/16,3/16,6/16 ]. Normally, the cross-entropy of X is (-0.25log(0.25)-0.25log(0.25)-0.5log(0.5)), but with weights taken into consideration, its weighted cross-entropy is (-(3/8)log(3/8)-(1/16)log(1/16)-(9/16log(9/16))). Generally, the C4.5 can be implemented by weighted cross-entropy, and its weight is [1,1,...,1]/N.
If you want to implement the AdaboostM.1 with C4.5 algorithmsm you should read some stuff in Page 339, the Elements of Statistical Learning.

metrics for evaluating ranking algorithms

I have created an algorithm that ranks entities . I was wondering what should be my metrics to evaluate my algorithm. Are their any algorithm of these type to which I can compare mine?
Normalized discounted cumulative gain is one of the standard method of evaluating ranking algorithms. You will need to provide a score to each of the recommendations that you give. If your algorithm assigns a low (better) rank to a high scoring entity, your NDCG score will be higher, and vice versa.
The score can depend on the aspect in the query.
You can also manually create a gold data set, with each result assigned a score. You can then use these scores to calculate NDCG.
Note that what I call score, is referred to as relevance (rel i, relevance of ith result) in the formulas.

Classification algorithms, which classifications can be evaluated as percentages

I'm implementing different classification algorithms to predict the outcome of soccer matches (Home, draw or away). In order to compare the classifications of different classifiers, the classifications from the classifiers are evaluated as percentages.
At the moment I'm using k-nearest neighbours (and counting neighbours of different classes to convert to percentages) and the naive bayes.
Besides the knn and naive bayes, which classifiers can be used for this task?
Support Vector Machines are probably the most common classifiers appearing in the literature right now, and there are several Random Forest classification schemes as well. Look at Weka for a package supporting those methods (and others) in Java. Also, R has a lot of tools for machine learning, so you could quickly test other algorithms without having to implement them yourself.
A logistic model will naturally express itself as probabilities. For soccer, quite a few people have modelled the goals scored by each side as a Poisson process, with rate depending on the relative strengths of the defense and offense concerned.

Resources