How to convert Vowpal Wabbit logistic predictions to probabilities - vowpalwabbit

I have given vowpal wabbit a dataset with two labels and performed logistic regression with it. The problem is, it is returning real numbers varying from positive to negative as prediction. Now if I want to transform these values to probability of some sort. How should I go about it.
I was thinking maybe the predicted value is a'x where a is coefficient vector and x is the feature vector. If this is the case then I can directly use the binomial link function to get the probs.

Use --link=logistic in command line.
Alternatively you may use script logistic in vw's utl folder to convert already obtained results.
Pls refer to How to return predictions in the [0, 1] interval for SVMs in vowpal wabbit

Related

How to form precision-recall curve using one test dataset for my algorithm?

I'm working on knowledge graph, more precisely in natural language processing field. To evaluate the components of my algorithm, it is necessary to be able to classify the good and the poor candidates. For this purpose, we manually classified pairs in a dataset.
My system returns the relevant pairs according to the implementation logic. now I'm able to calculate :
Precision = X
Recall = Y
For establishing a complete curve I need the rest of points (X,Y), what should I do?:
build another dataset for test ?
split my dataset ?
or any other solution ?
Neither of your proposed two methods. In short, Precision-recall or ROC curve is designed for classifiers with probabilistic output. That is, instead of simply producing a 0 or 1 (in case of binary classification), you need classifiers that can provide a probability in [0,1] range. This is the function to do it in sklearn, note how the 2nd parameter is called probas_pred.
To turn this probabilities into concrete class prediction, you can then set a threshold, say at .5. Setting such a threshold is problematic however, since you can trade-off precision/recall by varying the threshold, and an arbitrary choice can give false impression of a classifier's performance. To circumvent this, threshold-independent measures like area under ROC or Precision-Recall curve is used. They create thresholds at different intervals, say 0.1,0.2,0.3...0.9, turn probabilities into binary classes and then compute precision-recall for each such threshold.

Null linear regression model in vowpal wabbit

I would like to run a linear regression on vowpal wabbit using the null model (intercept only - for comparison reasons). Which optimizer should I use for this? Also is the best constant loss reported that of the simple average?
A1: For linear regression, if you care about averages, you should use --loss_function squared (which is the default). If you care more about the median rather than the average (e.g. if you have some outliers that may greatly mess-up the average), use --loss_function quantile. BTW: these are not optimizers, just loss functions. I would leave the optimizer (enhanced SGD) as is (the default) since it works very well.
A2: best constant is the constant prediction that would give the lowest error, and best constant loss is the average error for always predicting that best constant number. It is the weighted average of all your target-variables. This is not the same as the intercept b in the linear-regression formula y = Ai*xi + B. B is the free term, independent of the inputs. B is not necessarily the average of the ys.
A3: If you want to find the intercept of your model, look for the weight named Constant in your model. This would require two short steps:
# 1) Train your model from the dataset
# and save the model in human-readable (aka "inverted hash") format
vw --invert_hash model.ih your_dataset
# 2) Search for the free/intercept term in the readable model
grep '^Constant:' model.ih
The output of the grep step should be something like:
Constant:116060:-1.085126
Where 116060 is the hash-slot (location in the model) and -1.085126 is the value of the intercept (assuming no hash collisions, and a linear combination of the inputs.)

ROC for predictions - how to define class labels

I have a set of predictions from a model, and a set of true values of the observations, and I want to create an ROC.
The quality of the prediction (in absolute error terms) is independent of the magnitude of the prediction. So I have a set of predictions (pred(1), pred(2), ..., pred(n)) and observations (obs(1), obs(2), ..., obs(n)).
Someone told me to create the elements of my binary classification vector label(i) as label(i) = ifelse(|obs(i) - pred(i)| < tol, 1, 0) and then calculate the AUC (tol is some respecified tolerance). So for each prediction, if it is close to the corresponding observation, the corresponding label is 1, otherwise it is 0.
But I don't see how the suggested labeling is valid, as higher pred() values will not necessarily discriminate my binary classification, i.e. prediction values do not serve to "RANK" the quality of my predictions (i.e., a given threshold does not divide my data naturally). Can someone please shed some light for me on what to do here? Is the suggestion given above a valid one? Or is an ROC inappropriate to use here?
ROC analysis is defined for binary classification, where the observed labels can take two values (binary), and your predictions are any sort of numbers. There are extensions of ROC analysis to multi-class classification, but your question suggests that your observations are some sort of continuous measurement. You could binarize them (something like label(i) = ifelse(obs(i) > someValue, 1, 0)), but it would be invalid for the labels to depend on the classification: they must be some sort of truth that is independent on your classifier.
Alternatively if your observations are continuous, you should assess the quality of your predictions with a coefficient of correlation or a similar measure.

Vowpal Wabbit Readable Model Doesnt Have weights

I wanted to do lasso regression with Vowpal Wabbit. So, I used this command line
vw --save_resume --readable_model ob/e/nsefut/VW_testing/BuyModel.VWM -d ob/e/nsefut/VW_testing/VWsell.VWF --quiet --predictions ob/e/nsefut/VW_testing/predict.VW --loss_function logistic --noconstant --l1 0.001
The readable file shows no weights of the features that I used. But when I skip the param --l1 then it shows the weights properly.Plus when I dont give --l1 param, it came up with weights like this..
1:-0.437898 994842.000000 1.000000
33340:-0.176359 201942.265625 1.006310
59044:-0.152967 201843.875000 1.002754
63438:-0.187405 202149.140625 1.015530
124204:-0.159398 201741.187500 1.002742
166130:-0.185312 201754.421875 1.013330
Which suggest that all the weights are negative. But all my features are +ve valued. Hence my linear combination of features would be negative for all observation resulting prediction -ve for all observation. But I am seeing both +ve and -ve as predicted label.
Three questions
Whether my command line is correct for lasso regression.
What will enable me to see the weights.
What is it that making me not understand the all negative weights incident

Is there a special type of multivariate regression for multiple-parameter predictions?

I am trying using multivariate regression to play basketball. Specificlly, I need to, based on X, Y, and distance from the target, predict the pitch, yaw, and cannon strength. I was thinking of using multivariate regression with multipule variables for each of the output parameter. Is there a better way to do this?
Also, should I use solve directly for the best fit, or use gradient descent?
ElKamina's answer is correct but one thing to note about this is that it is identical to doing k independent ordinary least squares regressions. That is, the same as doing a separate linear regression from X to pitch, from X to yaw, and from X to strength. This means, you are not taking advantage of correlations between the output variables. This may be fine for your application, but one alternative that does take advantage of correlations in the output is reduced rank regression(a matlab implementation here), or somewhat related, you can explicitly uncorrelate y by projecting it onto its principle components (see PCA, also called PCA whitening in this case since you aren't reducing the dimensionality).
I highly recommend chapter 6 of Izenman's textbook "Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning" for a fairly high level overview of these techniques. If you're at a University it may be available online through your library.
If those alternatives don't perform well, there are many sophisticated non-linear regression methods that have multiple output versions (although most software packages don't have the multivariate modifications) such as support vector regression, Gaussian process regression, decision tree regression, or even neural networks.
Multivariate regression is equivalent to doing the inverse of the covariance of the input variable set. Since there are many solutions to inverting the matrix (if the dimensionality is not very high. Thousand should be okay), you should go directly for the best fit instead of gradient descent.
n be the number of samples, m be the number of input variables and k be the number of output variables.
X be the input data (n,m)
Y be the target data (n,k)
A be the coefficients you want to estimate (m,k)
XA = Y
X'XA=X'Y
A = inverse(X'X)X'Y
X' is the transpose of X.
As you can see, once you find the inverse of X'X you can calculate the coefficients for any number of output variables with just a couple of matrix multiplications.
Use any simple math tools to solve this (MATLAB/R/Python..).

Resources