What is the meaning of the value of the boosted tree? - graphviz

I plotted a tree and in the end of the trees (in the leaves) there are shown some values. What do they mean?
# model parameters
colsample_bytree = 0.4
objective = 'binary:logistic'
learning_rate = 0.05
eval_metric = 'auc'
max_depth = 8
min_child_weight = 4
n_estimators = 5000
seed = 7
# create and train model
bst = xgb.train(param,
dtrain,
num_boost_round = best_iteration)
dot = xgb.to_graphviz(bst, rankdir='LR')
dot.render("trees1")
I thought, it is a predicted proba score, but the leaves' values' range is up to .01. Whereas predicted proba score' range is up to 1. May be, it means predicted proba' score divided by 10 (e.g. leaf value = 0.01 means that predicted proba = 0.1)?
And why do some leaves have negative values (e.g. -0.01)?
Thank you.

The value of a leaf is your "eval_metric", local to your split :). For you it is the AUC.
Here are all attributes of a tree :
n_nodes = estimator.tree_.node_count
children_left = estimator.tree_.children_left
children_right = estimator.tree_.children_right
feature = estimator.tree_.feature
threshold = estimator.tree_.threshold
From doc : https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html#sphx-glr-auto-examples-tree-plot-unveil-tree-structure-py
Can't find it in the doc but "tree_.impurity" does exist aswell.

Related

Tune a learner with the searchspace parameter setting

I am trying to tune a ranger learner with the searchspace parameter setting. The purpose is to find the optimal K (the number of input indicators, I uesd a filterpipe with setting importance.filter.nfeat) and D (the depth of each tree, i.e., classif.ranger.max.depth) by grid search. D's value should not be greater than the number of input indicators K. The values searched for D are then set proportionally to the input K: D ∈ {10%, 25%, 50%, 100%} ∗ K. Values of D ≤ 0 were rejected.
However, I am unfamiliar with writing fuction code within searchspace, thus the can not achieve the purpose (D is greater than K).
My question is:
How to set a parameter that is based on the other one in the searchspace? (I think it is different with the depends metioned in mlr3 book)
Here is my code:
ranger = lrn("classif.ranger", importance = "impurity", predict_type = "prob", id = "ranger")
graph = po("filter", flt("importance"), filter.nfeat = 3) %>>% ranger %>>% po("threshold")
plot(graph)
graph_learner = GraphLearner$new(graph)
searchspace = ps(
importance.filter.nfeat = p_int(1,length(task$feature_names)),
classif.ranger.max.depth = p_int(1,length(task$feature_names)),
.extra_trafo = function(x, param_set) {x = graph_learner$param_set$importance.filter.nfeat * c(.1,.25,.50,1)})
inst1 = TuningInstanceMultiCrit$new(
task,
learner = graph_learner,
resampling = rsmp("cv"),
measures = msrs(c("classif.ce","classif.bacc","classif.mcc")),
terminator = trm("evals", n_evals = 50),
search_space = searchspace
)
tuner = tnr("grid_search")
# reduce logging output
lgr:: get_logger("bbotk") $set_threshold("warn")
# The tuning procedure may take some time:
set.seed(1234)
tuner$optimize(inst1)
#Returns list with optimal configurations and estimated performance.
inst1$result
# We can plot the performance against the number of features.
#If we do so, we see the possible trade-off between sparsity and predictive performance:
arx = as.data.table(inst$archive)
ggplot(arx, aes(x = importance.filter.nfeat, y = classif.ce)) + geom_line()
How to know what indicators are uesd in the tuned model, for we only see the trade-off between sparsity and predictive performance, are they based on the importance rank?
I also have tried the feature selection. In FS, I could get the optimal feature set. So what are the relationships betweet the tuning nfeat and feature selection? Which one is perfer in real partice?
# https://mlr3gallery.mlr-org.com/posts/2020-09-14-mlr3fselect-basic/
resampling = rsmp("cv")
measure = msr("classif.mcc")
terminator = trm("none")
ranger_lrn = lrn("classif.ranger", importance = "impurity", predict_type = "prob")
#
instance = FSelectInstanceSingleCrit$new(
task = task,
learner = ranger_lrn,
resampling = resampling,
measure = measure,
terminator = terminator,
store_models = TRUE)
#
fselector = fs("rfe", recursive = FALSE)
set.seed(1234)
fselector$optimize(instance)
#
as.data.table(instance$archive)
instance$result
instance$result_feature_set
instance$result_y
# set new feature_set
# task$select(instance$result_feature_set)
Does this answer question 1?
How to set specific values in `paradox`?
Seems that you could simply set up your own data table as shown there, except remove rows where D>K, then use the design_points tuner.

Why do the trace values have periods of (unwanted) stability?

I have a fairly simple test data set I am trying to fit with pymc3.
The result generated by traceplot looks something like this.
Essentially the trace of all parameter look like there is a standard 'caterpillar' for 100 iterations, followed by a flat line for 750 iterations, followed by the caterpillar again.
The initial 100 iterations happen after 25,000 ADVI iterations, and 10,000 tune iterations. If I change these amounts, I randomly will/won't have these periods of unwanted stability.
I'm wondering if anyone has any advice about how I can stop this from happening - and what is causing it?
Thanks.
The full code is below. In brief, I am generating a set of 'phases' (-pi -> pi) with a corresponding set of values y = a(j)*sin(phase) + b(j)*sin(phase). a and b are drawn for each subject j at random, but are related to each other.
I then essentially try to fit this same model.
Edit: Here is a similar example, running for 25,000 iterations. Something goes wrong around iteration 20,000.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
%matplotlib inline
np.random.seed(0)
n_draw = 2000
n_tune = 10000
n_init = 25000
init_string = 'advi'
target_accept = 0.95
##
# Generate some test data
# Just generates:
# x a vector of phases
# y a vector corresponding to some sinusoidal function of x
# subject_idx a vector corresponding to which subject x is
#9 Subjects
N_j = 9
#Each with 276 measurements
N_i = 276
sigma_y = 1.0
mean = [0.1, 0.1]
cov = [[0.01, 0], [0, 0.01]] # diagonal covariance
x_sub = np.zeros((N_j,N_i))
y_sub = np.zeros((N_j,N_i))
y_true_sub = np.zeros((N_j,N_i))
ab_sub = np.zeros((N_j,2))
tuning_sub = np.zeros((N_j,1))
sub_ix_sub = np.zeros((N_j,N_i))
for j in range(0,N_j):
aj,bj = np.random.multivariate_normal(mean, cov)
#aj = np.abs(aj)
#bj = np.abs(bj)
xj = np.random.uniform(-1,1,size = (N_i,1))*np.pi
xj = np.sort(xj)#for convenience
yj_true = aj*np.sin(xj) + bj*np.cos(xj)
yj = yj_true + np.random.normal(scale=sigma_y, size=(N_i,1))
x_sub[j,:] = xj.ravel()
y_sub[j,:] = yj.ravel()
y_true_sub[j,:] = yj_true.ravel()
ab_sub[j,:] = [aj,bj]
tuning_sub[j,:] = np.sqrt(((aj**2)+(bj**2)))
sub_ix_sub[j,:] = [j]*N_i
x = np.ravel(x_sub)
y = np.ravel(y_sub)
subject_idx = np.ravel(sub_ix_sub)
subject_idx = np.asarray(subject_idx, dtype=int)
##
# Fit model
hb1_model = pm.Model()
with hb1_model:
# Hyperpriors
hb1_mu_a = pm.Normal('hb1_mu_a', mu=0., sd=100)
hb1_sigma_a = pm.HalfCauchy('hb1_sigma_a', 4)
hb1_mu_b = pm.Normal('hb1_mu_b', mu=0., sd=100)
hb1_sigma_b = pm.HalfCauchy('hb1_sigma_b', 4)
# We fit a mixture of a sine and cosine with these two coeffieicents
# allowed to be different for each subject
hb1_aj = pm.Normal('hb1_aj', mu=hb1_mu_a, sd=hb1_sigma_a, shape = N_j)
hb1_bj = pm.Normal('hb1_bj', mu=hb1_mu_b, sd=hb1_sigma_b, shape = N_j)
# Model error
hb1_eps = pm.HalfCauchy('hb1_eps', 5)
hb1_linear = hb1_aj[subject_idx]*pm.math.sin(x) + hb1_bj[subject_idx]*pm.math.cos(x)
hb1_linear_like = pm.Normal('y', mu = hb1_linear, sd=hb1_eps, observed=y)
with hb1_model:
hb1_trace = pm.sample(draws=n_draw, tune = n_tune,
init = init_string, n_init = n_init,
target_accept = target_accept)
pm.traceplot(hb1_trace)
To partially answer my own question: After playing with this for a while, it looks like the problem might be due to the hyperprior standard deviation going to 0. I am not sure why the algorithm should get stuck there though (testing a small standard deviation can't be that uncommon...).
In any case, two solutions that seem to alleviate the problem (although they don't remove it entirely) are:
1) Add an offset to the definitions of the standard deviation. e.g.:
offset = 1e-2
hb1_sigma_a = offset + pm.HalfCauchy('hb1_sigma_a', 4)
2) Instead of using a HalfCauchy or HalfNormal for the SD prior, use a logNormal distribution set so that 0 is unlikely.
I'd look at the divergencies, as explained in notes and literature on Hamiltonian Monte Carlo, see, e.g., here and here.
with model:
np.savetxt('diverging.csv', hb1_trace['diverging'])
As a dirty solution, you can try to increase target_accept, perhaps.
Good luck!

How to add a constraint in CVaR optimization code in Matlab?

I want to find the optimal weights in an multi-asset portfolio by minimizing the VaR.
This is the code that gives a minimum risk for a target return.
p = PortfolioCVaR('ProbabilityLevel', .99, 'AssetNames', names);
p = p.setScenarios(R); % R= asset returns
p = p.setDefaultConstraints();
wts = p.estimateFrontier(20);
portRisk = p.estimatePortRisk(wts);
portRet = p.estimatePortReturn(wts);
clf
visualizeFrontier(p, portRisk, portRet);
%% Compute portfolio with given level of return
tic;
wt = p.estimateFrontierByReturn(.05/100);
toc;
pRisk = p.estimatePortRisk(wt);
pRet = p.estimatePortReturn(wt);
The sum of weights = 1 .. My question is how to add a constraint such that no asset can have a weight of greater than 60%.
Thank you for any help you could provide
Use the object's setBounds property,
>> p = setBounds(p,LowerBoundsVector,UpperBoundsVector);
See
>> doc setBounds
for more info.

Implicit recommender Tuning hyper parameters Pyspark

computeMAPK function takes the model, Actual data and Validation data (user,product) to generate ratings. Then sort the predicted ratings for every user and take top K to compare with the actual data to calculate Mean Average Precision at K
I am using this function to tune the hyper parameters i.e. fit multiple models and select the best Lambda, Alpha, Ranks with highest MAPK. This works for small data sets but when the the matrix becomes 10 Million users * 200 products. It breaks especially with reduceByKey step and joins. Any better way to Tune the hyper parameters for ALS implicit ? and I am using Spark 1.3.
Actual RDD is of the form (user,product)
Valid RDD is of the form (user,product)
def apk(act_pred):
predicted = act_pred[0]
actual = act_pred[1]
k = act_pred[2]
if len(predicted)>k:
predicted = predicted[:k]
score =0.0
num_hits = 0.0
for i,p in enumerate(predicted):
if p in actual and p not in predicted[:i]:
num_hits += 1.0
score += num_hits / (i+1.0)
if not actual:
return 1.0
#return num_hits
return (score/min(len(actual),k))
def computeMAPKR(model,actual,valid,k):
pred = model.predictAll(valid).map(lambda x:(x[0],[(x[1],x[2])])).cache()
gp = pred.reduceByKey(lambda x,y:x+y)
#gp = pred.groupByKey().map(lambda x : (x[0], list(x[1])))
# for every user, sort the items by predicted ratings and get user, item pairs
def f(x):
s = sorted(x,key=lambda x:x[1],reverse=True)
sm = map(lambda x:x[0],s)
return sm
sp = gp.mapValues(f)
# actual data
ac = actual.map(lambda x:(x[0],[(x[1])]))
#gac = ac.reduceByKey(lambda x,y:(x,y)).map(lambda x : (x[0], list(x[1])))
gac = ac.reduceByKey(lambda x,y:x+y)
ap = sp.join(gac)
apk_result = ap.map(lambda x:(x[0],(x[1][0],x[1][1],k))).mapValues(apk)
mapk = apk_result.map(lambda x :x[1]).reduce(add) / ap.count()
#print(apk_result.collect())
return mapk

How to implement Roulette Wheel Selection and Rank Sleection on Matlab code for the Traveling Salesman Problom?

I have an assignment coding a genetic algorithm for the traveling salesman problem. I've written some code giving correct results using Tournament Selection.
The problem is, I have to do Wheel and Rank and the results I get are incorrect.
Here is my code using Tournament Selection:
clc;
clear all;
close all;
nofCities = 30;
initialPopulationSize = nofCities*nofCities;
generations = nofCities*ceil(nofCities/10);
cities = floor(rand([nofCities 2])*100+1);
figure;
hold on;
scatter(cities(:,1), cities(:,2), 5, 'b','fill');
line(cities(:,1), cities(:,2));
line(cities([1 end],1), cities([1 end],2));
axis([0 110 0 110]);
population = zeros(initialPopulationSize ,nofCities);
for i=1:initialPopulationSize
population(i,:) = randperm(nofCities);
end
distanceMatrix = zeros(nofCities);
for i=1:nofCities
for j=1:nofCities
if (i==j)
distanceMatrix(i,j)=0;
else
distanceMatrix(i,j) = sqrt((cities(i,1)-cities(j,1))^2+(cities(i,2)-cities(j,2))^2);
end
end
end
for u=1:generations
tourDistance = zeros(initialPopulationSize ,1);
for i=1:initialPopulationSize
for j=1:length(cities)-1
tourDistance(i) = tourDistance(i) + distanceMatrix(population(i,j),population(i,j+1));
end
end
for i=1:initialPopulationSize
tourDistance(i) = tourDistance(i) + distanceMatrix(population(i,end),population(i,1));
end
min(tourDistance)
newPopulation = zeros(initialPopulationSize,nofCities);
for k=1:initialPopulationSize
child = zeros(1,nofCities);
%tournament start
for i=1:5
tournamentParent1(i) = ceil(rand()*initialPopulationSize);
end
p1 = find(tourDistance == min(tourDistance([tournamentParent1])));
parent1 = population(p1(1), :);
for i=1:5
tournamentParent2(i) = ceil(rand()*initialPopulationSize);
end
p2 = find(tourDistance == min(tourDistance([tournamentParent2])));
parent2 = population(p2(1), :);
%tournament end
%crossover
startPos = ceil(rand()*(nofCities/2));
endPos = ceil(rand()*(nofCities/2)+10);
for i=1:nofCities
if (i>startPos && i<endPos)
child(i) = parent1(i);
end
end
for i=1:nofCities
if (isempty(find(child==parent2(i))))
for j=1:nofCities
if (child(j) == 0)
child(j) = parent2(i);
break;
end
end
end
end
newPopulation(k,:) = child;
end
%mutation
mutationRate = 0.015;
for i=1:initialPopulationSize
if (rand() < mutationRate)
pos1 = ceil(rand()*nofCities);
pos2 = ceil(rand()*nofCities);
mutation1 = newPopulation(i,pos1);
mutation2 = newPopulation(i,pos2);
newPopulation(i,pos1) = mutation2;
newPopulation(i,pos2) = mutation1;
end
end
population = newPopulation;
u
end
figure;
hold on;
scatter(cities(:,1), cities(:,2), 5, 'b','fill');
line(cities(population(i,:),1), cities(population(i,:),2));
line(cities([population(i,1) population(i,end)],1), cities([population(i,1) population(i,end)],2));
axis([0 110 0 110]);
%close all;
What I want is to replace the tournament code with wheel and rank code.
Here is what I wrote for the Wheel Selection:
fitness = tourDistance./sum(tourDistance);
wheel = cumsum(fitness);
parent1 = population(find(wheel >= rand(),1),:);
parent2 = population(find(wheel >= rand(),1),:);
Here is a vectorized implementation of a roulette wheel selection in Matlab:
[~,W] = min(ones(popSize,1)*rand(1,2*popSize) > ((cumsum(fitness)*ones(1,2*popSize)/sum(fitness))),[],1);
This assumes that the fitness input into the selection scheme is a matrix of size (popSize x 1) (or a column vector of the same size as the number of population members).
And popSize is obviously the amount of members in your population. And W is the winners or the population members that are selected to become parents/crossover.
The output of the selection will be selected_parents which is a double row vector of size 2*popSize which has all of the indices of the members of the population that will be used in the crossover stage.
This row vector can then be input into a vectorized crossover scheme that could look something like this:
%% Single-Point Preservation Crossover
Pop2 = Pop(W(1:2:end),:); % Pop2 Winners 1
P2A = Pop(W(2:2:end),:); % Pop2 Winners 2
Lidx = sub2ind(size(Pop),[1:popSize]',round(rand(popSize,1)*(genome-1)+1));
vLidx = P2A(Lidx)*ones(1,genome);
[r,c]=find(Pop2==vLidx);
[~,Ord]=sort(r);
r = r(Ord); c = c(Ord);
Lidx2 = sub2ind(size(Pop),r,c);
Pop2(Lidx2) = Pop2(Lidx);
Pop2(Lidx) = P2A(Lidx);
this crossover assumes an input of the W variable from the selection scheme. It also uses Pop which is the population members stored in a popSize by genome matrix. (genome is the number of cities in one tour and also happens to be the size of the genome). The genome is stored as an array of integers with each integer being a city and the tour being defined as the order from the value of the genome array from the array's first index to the array's last index.
while we are at it we may as well include a nice vectorized mutation scheme for a permuation genetic algorithm (which this is).
%% Mutation (Permutation)
idx = rand(popSize,1)<mutRate;
Loc1 = sub2ind(size(Pop2),1:popSize,round(rand(1,popSize)*(genome-1)+1));
Loc2 = sub2ind(size(Pop2),1:popSize,round(rand(1,popSize)*(genome-1)+1));
Loc2(idx == 0) = Loc1(idx == 0);
[Pop2(Loc1), Pop2(Loc2)] = deal(Pop2(Loc2), Pop2(Loc1));
This mutation randomly flips the order of 2 cities in our tour (genome).
Finally make sure to update your population after all of that work we did!
%% Update Population!
Pop = Pop2; % updates the population to include crossovers and mutation.
So i know this reply is probably way too late for your assignment, but hopefully it will help someone else with a similar problem.
I REALLY REALLY recommend anyone interested in vectorized genetic algorithms in Matlab to read this paper: UCL: Efficiently Vectorized Code for Population Based Optimization Algorithms
It is what i based all of the code off of in the examples and it will teach you why you are writing the code that way. Its a great resource and what got me started with GAs.
For wheel selection to work, you should start with designing a fitness measure with fitter individuals having a bigger value. In contrast to the distance where better individuals having a smaller value. Then your approach with the cumsum should work.
Where is the issue with ranking selection?

Resources