Spark - Principal Component Analysis - PCA requires a number of principal components k >= 1 but was given 0 - apache-spark-mllib

I'm trying to reduce my dimensionality (I've 120 columns) for that I want to apply PCA method:
val data = sc.textFile("data")
val header = data.first
val rows = data.filter(l => l != header)
import org.apache.spark.mllib.feature.PCA
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.{LabeledPoint, LinearRegressionWithSGD}
val data = rows.map { line =>
val parts = line.split(';')
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(';').map(_.toDouble)))
}.cache()
val splits = data.randomSplit(Array(0.9, 0.1), seed = 11L)
val training = splits(0).cache()
val test = splits(1)
val pca = new PCA(training.first().features.size / 2).fit(data.map(_.features))
But I'm getting this error:
java.lang.IllegalArgumentException: requirement failed: PCA requires a number of principal components k >= 1 but was given 0
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.mllib.feature.PCA.<init>(PCA.scala:33)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
Anyone have an idea why I'm having this issue?
Thanks!

You're splitting on ';' to obtain parts, and then in the construction of the LabeledPoint you're splitting parts on ';' again. I suspect that the resulting Vector has length 1, so when you compute training.first().features.size/2 you get 0.

Related

Runge-Kutta curve fitting extremely slow

I am currently trying to do a regression of a function calculated via a RK4 method performed on a non-linear Volterra integral of the second kind. The problem I found is that the code is extremely slow, for 1 call of the curve_fit function (fitt), it takes about 30-40 minute to generate a data. Overall, there will be a lot of calls to fitt before the parameters are determined, this takes more than 6 hours to run. Is there anyway to optimize this code? Thanks in advance!
from scipy.special import gamma
from ml_internal import LTInversion
from scipy.optimize import curve_fit , fsolve
from scipy.misc import derivative
from sklearn.metrics import r2_score
from math import comb , factorial
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
# Gets the data
df = pd.read_excel('D:\\CoMat\\Fractional_fit\\optimized\\data_optimized.xlsx')
skipTime = 1
skipIndex = df[df['Time']== skipTime].index.values[0]
xls = pd.read_excel('D:\\CoMat\\Fractional_fit\\optimized\\data_optimized.xlsx',skiprows=np.arange(1,skipIndex+1,1))
timeDF = xls['Time']
tempDF = xls['Temp']
taDF = xls['Ta']
timeDF = timeDF - timeDF[0]
tempDF = tempDF + 273.15
t0 = tempDF[0]
ta = sum(taDF)/len(taDF)
ta = ta + 273.15
###########################################
#Spliting into intervals
h = 0.05
a = 0
b = timeDF[len(timeDF)-1]
N = int(np.round((b-a)/h))
#Each xi
def xidx(index):
return a + h*index
#Function in the image are written here.
def gx(t,lamda,alpha):
return t0 * ml(lamda*(t**alpha),alpha)
gx = np.vectorize(gx)
def kernel(t,s,rad,lamda,alpha,beta):
if t == s:
return 0
return (t-s)**(alpha-1) * ml_(lamda*((t-s)**alpha),alpha,alpha,1) * (beta*(rad**4) - beta*(ta**4) - lamda*ta)
kernel = np.vectorize(kernel)
############################
# The problem is here!!!!!!
def fx(x,n,lamda,alpha,beta):
ans = gx(x,lamda,alpha)
for j in range(n):
ans += (h/6)*(kernel(x,xidx(j),f0[j],lamda,alpha,beta) + 2*kernel(x,xidx(j+1/2),f1[j],lamda,alpha,beta) + 2*kernel(x,xidx(j+1/2),f2[j],lamda,alpha,beta) + kernel(x,xidx(j+1),f3[j],lamda,alpha,beta))
return ans
#########################
f0 = np.zeros(N+1)
f0[0] = t0
f1 = np.zeros(N+1)
f2 = np.zeros(N+1)
f3 = np.zeros(N+1)
F = np.zeros((3,N+1))
def fitt(xvalue,lamda,alpha,beta):
global f0,f1,f2,f3,F
n = int(np.round(xvalue/h))
f1[n] = fx(xidx(n) + 1/2,n,lamda,alpha,beta) + (h/2)*kernel(xidx(n + 1/2),xidx(n),f0[n],lamda,alpha,beta)
f2[n] = fx(xidx(n + 1/2),n,lamda,alpha,beta)
f3[n] = fx(xidx(n+1),n,lamda,alpha,beta) + h*kernel(xidx(n+1),xidx(n+1/2),f2[n],lamda,alpha,beta)
if n+1 <= N:
f0[n+1] = fx(xidx(n+1),n,lamda,alpha,beta) + (h/6)*(kernel(xidx(n+1),xidx(n),f0[n],lamda,alpha,beta) + 2*kernel(xidx(n+1),xidx(n+1/2),f1[n],lamda,alpha,beta) + 2*kernel(xidx(n+1),xidx(n+1/2),f2[n],lamda,alpha,beta) + kernel(xidx(n+1),xidx(n+1),f3[n],lamda,alpha,beta))
if(xvalue == timeDF[len(timeDF) - 1]):
print(f0[n],n)
returnValue = f0[n]
f0 = np.zeros(N+1)
f0[0] = t0
f1 = np.zeros(N+1)
f2 = np.zeros(N+1)
f3 = np.zeros(N+1)
return returnValue
print(f0[n],n)
return f0[n]
fitt = np.vectorize(fitt)
#Fitting, plotting and giving (Adj) R-squared
popt , pcov = curve_fit(fitt,timeDF,tempDF,p0=(-0.1317,0.95,-1e-11),bounds=((-np.inf,0,-np.inf),(0,1,0)))
print(popt)
y_fit = np.array(fitt(timeDF,popt[0],popt[1],popt[2]))
plt.scatter(timeDF,tempDF,color='ORANGE',marker='.',s=0.5)
plt.fill_between(timeDF,tempDF-0.5,tempDF+0.5,color='ORANGE', alpha=0.2)
plt.plot(timeDF,y_fit,color='RED',linewidth=1)
plt.legend(["Experimental data", "Caputo fit"], loc ="upper right")
plt.xlabel("Time (min)")
plt.ylabel("Temperature (Kelvin)")
plt.show()
plt.close()
r2 = r2_score(tempDF,y_fit)
print(r2)
adjr2 = 1 - (1 - r2)*((len(xls)-1)/(len(xls)-3-1))
print(adjr2)
I already tried computing the values f0,f1,f2,f3 all at once, but the thing consuming the most time is Fn(x) which I haven't figured in out how to compute them all at once. If this is possible to compute at once, I think the program will run much faster. PS: ML,ML_ is a function from https://github.com/khinsen/mittag-leffler.
This is the function necesssary. Fn is the only one I haven't figured out yet.
There are two typing errors in the cited image. The combination of x_n and 1/2 is always meant to be the midpoint x_{n+1/2} = x_n + h/2. The second error is a duplication of x_{n+1/2} in the formula for f^{(4)}_n in its third term. The first error is probably producing errors that are large enough to make convergence complicated and any limit wrong for the intended problem.
In the Simpson/RK4 step, the 4 fx computations can be reduced to 2.
The F_n implement the left side of the integral equation
F(x) = g(x) + int(s=0 to x of K(x,s,f(s))
where the integral is approximated with the sample sequences f0,...,f3. Due to the structure of problem and algorithm F_n(x_n)=f^0_n = f^4_{n-1}.
Note that K(x,s,f) should be set to zero for s >= x. In the exact version of the equation these values "above the diagonal" are not used.
If an increase in accuracy is needed, for instance to avoid divergence where there is none in the exact solution, you can decrease the step site by a factor of 10 and then sub-sample the f^0_n sequence to produce the numerical guess for the given data. Other factors than 10 are of course also possible.

plot decision paths from all of the test sample

There is this code obtained from:
How to display the path of a Decision Tree for test samples?
Basically it plots the decision path of a sample over the decision tree plot to know how was a specific prediction made
import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
clf = tree.DecisionTreeClassifier(random_state=42)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
# empty all nodes, i.e.set color to white and number of samples to zero
for node in graph.get_node_list():
if node.get_attributes().get('label') is None:
continue
if 'samples = ' in node.get_attributes()['label']:
labels = node.get_attributes()['label'].split('<br/>')
for i, label in enumerate(labels):
if label.startswith('samples = '):
labels[i] = 'samples = 0'
node.set('label', '<br/>'.join(labels))
node.set_fillcolor('white')
samples = iris.data[129:130]
decision_paths = clf.decision_path(samples)
for decision_path in decision_paths:
for n, node_value in enumerate(decision_path.toarray()[0]):
if node_value == 0:
continue
node = graph.get_node(str(n))[0]
node.set_fillcolor('green')
labels = node.get_attributes()['label'].split('<br/>')
for i, label in enumerate(labels):
if label.startswith('samples = '):
labels[i] = 'samples = {}'.format(int(label.split('=')[1]) + 1)
node.set('label', '<br/>'.join(labels))
filename = 'tree.png'
graph.write_png(filename)
What I want to do is to plot all of the samples decision paths in different plots in my Jupiter notebook. What should I add to the code?
As the question doesn't provide concrete expected results, I assume that you want to plot the parts of the decision paths of your classification results.
One of the solutions would be adding one more layer of for loop to achieve it. However, it might affect your program's performance. Hence, use with caution.
import pydotplus
from sklearn.datasets import load_iris
from sklearn import tree
clf = tree.DecisionTreeClassifier(random_state=42)
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True,
special_characters=True)
graph = pydotplus.graph_from_dot_data(dot_data)
# empty all nodes, i.e.set color to white and number of samples to zero
for node in graph.get_node_list():
if node.get_attributes().get('label') is None:
continue
if 'samples = ' in node.get_attributes()['label']:
labels = node.get_attributes()['label'].split('<br/>')
for i, label in enumerate(labels):
if label.startswith('samples = '):
labels[i] = 'samples = 0'
node.set('label', '<br/>'.join(labels))
node.set_fillcolor('white')
# samples = iris.data[129:130]
samples = iris.data[120:130] # <-- Classifying 10 data
decision_paths = clf.decision_path(samples)
for decision_path in decision_paths:
for path in decision_path.toarray(): # <-- Adding one more layer of for loop to loop each path
# for n, node_value in enumerate(decision_path.toarray()[0]):
for n, node_value in enumerate(path):
if node_value == 0:
continue
node = graph.get_node(str(n))[0]
node.set_fillcolor('green')
labels = node.get_attributes()['label'].split('<br/>')
for i, label in enumerate(labels):
if label.startswith('samples = '):
labels[i] = 'samples = {}'.format(int(label.split('=')[1]) + 1)
node.set('label', '<br/>'.join(labels))
# display it inline
display_png(Image(graph.create_png()))
# or save it as png
filename = 'tree.png'
graph.write_png(filename)
The result:

Why does this error pop up, what are your thoughts on my neural network/genetic algorithm?

Preamble:
This is a combination of my first and second programs in python (besides hello world level tutorials). Any questions I've had have led me to this site so it seemed fitting that I post it here. I come from a TI-Basic background; so if you have no idea why I did it this why when you should do it this why that is likely why.
My first program was a genetic learning algorithm. Its testing setup was/is to guess your input string. There is currently a problem with it, but it only slightly affects the efficiency of the program.[1]
My second is a simple feed forward neural network (I am currently only working on the xor problem). Some of the code for customizing the variables (the number of inputs, the number of outputs, the number of hidden layers, the number of neurons in those hidden layers) is there but is currently not my focus.
What I am trying to do now is train my network with my genetic algorithm. All seems to be fine but I keep getting a un-explanable error.
Traceback (most recent call last):
File "python", line 174, in <module>
File "python", line 68, in fitness_function
File "python", line 146, in weight_dot_value_plus_bias
TypeError: 'int' object is not subscriptable
Now the weird thing is, the code this is referring to is a direct transfer of code from the original neural network.
I am using repl.it as my compiler, could that be the problem?
import random
from random import choice
from random import randint
#Global varables
length_of_phrase = 15
generation_number = 0
max_number_of_generations = 250
population = 150
perckill = 40
percparents = 35
percrandom = 1
percmutate = 1
individual_by_gene_matrix = [0]
one = 1
zero = 0
number_of_layers = 3
number_of_neurons = [2,3,1]
nnv = [0]*number_of_layers
nnw = [0]*number_of_layers
nnb = [0]*number_of_layers
val1 = randint(0,1)
val2 = randint(0,1)
living = int(((100 - perckill)*population)//100)
dead = population - living
random_strings = int((( percrandom)*population)//100)
reproduced_strings = int(living + random_strings)
parents = int(((100 - percparents)*population)//100)
"""
print(living)
print(dead)
print(population)
print(random_strings)
print(reproduced_strings)
"""
def random_matrix_generator(): #generates a matrix With = number of genes in the target and Hight = population
from random import randint
individual_by_gene_matrix = [[randint(-200, 200)/100 for x in range(length_of_phrase)] for x in range(population)]
#horozontal is traits, vertical is individual
#each gene represents a letter
#each individual represents a word
return(individual_by_gene_matrix)
"""
def convert_matrix_into_list_of_stings():
listofstrings = [ () for var in range( population)]
for var in range( population ):
list = individual_by_gene_matrix[var] #creates a list for each individual with their traits
lista = [ (chr(n )) for n in list] #the traits become letters
listofstrings[var] = ''.join(lista) #creates a list of all the individuals with letters joined
return(listofstrings)
"""
def fitness_function():
for individual in range (population):
number_of_layers,number_of_neurons ,nnv,nnw,nnb = NN_setup(val1,val2,individual_by_gene_matrix[individual][0],individual_by_gene_matrix[individual][1],individual_by_gene_matrix[individual][2],individual_by_gene_matrix[individual][3],individual_by_gene_matrix[individual][4],individual_by_gene_matrix[individual][5],individual_by_gene_matrix[individual][6],individual_by_gene_matrix[individual][7],individual_by_gene_matrix[individual][8],individual_by_gene_matrix[individual][9],individual_by_gene_matrix[individual][10],individual_by_gene_matrix[individual][11],individual_by_gene_matrix[individual][12],individual_by_gene_matrix[individual][13],individual_by_gene_matrix[individual][14])
for var in range (1,number_of_layers):
nnv = weight_dot_value_plus_bias(var)
nnv = sigmoid(var)
fitness[individual] = 1-abs((val1 ^ val2)- (nnv[2][0]))
#for n in range(population):
#print('{} : {} : {}'.format(n, listofstrings[n], fitness[n]))
return(fitness)
def matrix_reorder():
temp_individual_by_gene_matrix = [[0 for var in range(length_of_phrase)] for var in range(population)]
temp_fitness = [(0) for var in range(population)]
for var in range(population):
var_a = fitness.index(max(fitness))
temp_fitness[var] = fitness.pop(var_a)
temp_individual_by_gene_matrix[var] = individual_by_gene_matrix.pop(var_a)
return(temp_individual_by_gene_matrix, temp_fitness)
def kill():
for individal in range(living, population):
individual_by_gene_matrix[individal] = [0]*length_of_phrase
return(individual_by_gene_matrix)
def reproduce():
for individual in range(living,reproduced_strings):
for gene in range(length_of_phrase):
individual_by_gene_matrix[individual][gene] = randint(-200,200)/100
for individual in range(reproduced_strings, population):
mom = randint(0,parents)
dad = randint(0,parents)
for gene in range(length_of_phrase):
individual_by_gene_matrix[individual][gene] = random.choice([individual_by_gene_matrix[mom][gene],individual_by_gene_matrix[dad][gene]])
return(individual_by_gene_matrix)
def mutate():
for individual in range(population):
for gene in range(length_of_phrase):
if randint(0,100)<=percmutate:
individual_by_gene_matrix[individual][gene] = random.gauss(individual_by_gene_matrix[individual][gene],0.5)
return(individual_by_gene_matrix)
def NN_setup(val1,val2,w100,w101,w110,w111,w120,w121,w200,w201,w202,b00,b01,b10,b11,b12,b20):
number_of_layers = 3
number_of_neurons = [2,3,1]
nnv = [0]*number_of_layers
nnw = [0]*number_of_layers
nnb = [0]*number_of_layers
for layer in range ( number_of_layers ):
nnv[layer] = [0]*number_of_neurons[layer]
nnw[layer] = [0]*number_of_neurons[layer]
nnb[layer] = [0]*number_of_neurons[layer]
if layer != 0:
for neuron in range (number_of_neurons[layer]):
nnw[layer][neuron] = [0]*number_of_neurons[layer - 1]
nnv = [[val1,val2],[0.0,0.0,0.0],[0.0]]
nnw = [['inputs have no weight'],[[w100,w101],[w110,w111],[w120,w121]],[[w200,w201,w202]]]
nnb = [[b00,b01],[b10,b11,b12],[b20]]
return(number_of_layers,number_of_neurons,nnv,nnw,nnb)
|
|
|
v
def weight_dot_value_plus_bias(layer):
for nueron in range (number_of_neurons[layer]):
for weight in range (number_of_neurons[layer - 1]):
---> nnv[layer][nueron] += (nnv[layer-1][weight])*(nnw[layer][nueron][weight])
nnv[layer][nueron] += nnb[layer][nueron]
return(nnv)
def sigmoid(layer):
for neuron in range(number_of_neurons[layer]):
nnv[layer][neuron] = (1/(1+3**(-nnv[layer][neuron])))
return(nnv)
individual_by_gene_matrix = random_matrix_generator()
while (generation_number <= max_number_of_generations):
val1 = randint(0,1)
val2 = randint(0,1)
fitness = [(0) for var in range(population)]
#populations_phenotypes_by_individual = convert_matrix_into_list_of_stings()
fitness = fitness_function()
individual_by_gene_matrix , fitness = matrix_reorder()
individual_by_gene_matrix = kill()
individual_by_gene_matrix = reproduce()
individual_by_gene_matrix = mutate()
individual_by_gene_matrix , fitness = matrix_reorder()
#populations_phenotypes_by_individual = convert_matrix_into_list_of_stings()
print('{} {} {} {}'.format(generation_number,(10000(fitness[0]))//100),val1,val2)
generation_number += 1
print('')
print('')
print(individual_by_gene_matrix[0])
That was way to many indents!!!
How the hell do I just insert a block of code????!!!!!
I'll give you the source code to the individual programs once I learn how to insert a block of code
[1] Your going to have to wait till I give you the source code to just the genetic algorithm
Any tips, suggestions, maybe how would you write the code to what I'm trying to do?

scala : better solution

QUESTION:
We define super digit of an integer x using the following rules:
Iff x has only 1 digit, then its super digit is x.
Otherwise, the super digit of x is equal to the super digit of the digit-sum of x. Here, digit-sum of a number is defined as the sum of its digits.
For example, super digit of 9875 will be calculated as:
super-digit(9875) = super-digit(9+8+7+5)
= super-digit(29)
= super-digit(2+9)
= super-digit(11)
= super-digit(1+1)
= super-digit(2)
= 2.
You are given two numbers - n k. You have to calculate the super digit of P.
P is created when number n is concatenated k times. That is, if n = 123 and k = 3, then P = 123123123.
Input Format
Input will contain two space separated integers, n and k.
Output Format
Output the super digit of P, where P is created as described above.
Constraint
1≤n<10100000
1≤k≤105
Sample Input
148 3
Sample Output
3
Explanation
Here n = 148 and k = 3, so P = 148148148.
super-digit(P) = super-digit(148148148)
= super-digit(1+4+8+1+4+8+1+4+8)
= super-digit(39)
= super-digit(3+9)
= super-digit(12)
= super-digit(1+2)
= super-digit(3)
= 3.
I have written the following program to solve the above problem , but how to solve it even efficiently and is string operation efficient than math operation ??? and for few inputs it takes a long time for example
861568688536788 100000
object SuperDigit {
def main(args: Array[String]) {
/* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution
*/
def generateString (no:String,re:BigInt , tot:BigInt , temp:String):String = {
if(tot-1>re) generateString(no+temp,re+1,tot,temp)
else no
}
def totalSum(no:List[Char]):BigInt = no match {
case x::xs => x.asDigit+totalSum(xs)
case Nil => '0'.asDigit
}
def tot(no:List[Char]):BigInt = no match {
case _ if no.length == 1=> no.head.asDigit
case no => tot(totalSum(no).toString.toList)
}
var list = readLine.split(" ");
var one = list.head.toString();
var two = BigInt(list(1));
//println(generateString("148",0,3,"148"))
println(tot(generateString(one,BigInt(0),two,one).toList))
}
}
One reduction is to realise that you do not have to concatenate the number considered as a string k times but rather can start with the number k * qs(n) (where qs is the function that maps a number to its sum of digits, i. e. qs(123) = 1+2+3). Here is a more functional programming stylish approach. I do not know whether it can be made faster than this.
object Solution {
def qs(n: BigInt): BigInt = n.toString.foldLeft(BigInt(0))((n, ch) => n + (ch - '0').toInt)
def main(args: Array[String]) {
val input = scala.io.Source.stdin.getLines
val Array(n, k) = input.next.split(" ").map(BigInt(_))
println(Stream.iterate(k * qs(n))(qs(_)).find(_ < 10).get)
}
}

How to model a mixture of 3 Normals in PyMC?

There is a question on CrossValidated on how to use PyMC to fit two Normal distributions to data. The answer of Cam.Davidson.Pilon was to use a Bernoulli distribution to assign data to one of the two Normals:
size = 10
p = Uniform( "p", 0 , 1) #this is the fraction that come from mean1 vs mean2
ber = Bernoulli( "ber", p = p, size = size) # produces 1 with proportion p.
precision = Gamma('precision', alpha=0.1, beta=0.1)
mean1 = Normal( "mean1", 0, 0.001 )
mean2 = Normal( "mean2", 0, 0.001 )
#deterministic
def mean( ber = ber, mean1 = mean1, mean2 = mean2):
return ber*mean1 + (1-ber)*mean2
Now my question is: how to do it with three Normals?
Basically, the issue is that you can't use a Bernoulli distribution and 1-Bernoulli anymore. But how to do it then?
edit: With the CDP's suggestion, I wrote the following code:
import numpy as np
import pymc as mc
n = 3
ndata = 500
dd = mc.Dirichlet('dd', theta=(1,)*n)
category = mc.Categorical('category', p=dd, size=ndata)
precs = mc.Gamma('precs', alpha=0.1, beta=0.1, size=n)
means = mc.Normal('means', 0, 0.001, size=n)
#mc.deterministic
def mean(category=category, means=means):
return means[category]
#mc.deterministic
def prec(category=category, precs=precs):
return precs[category]
v = np.random.randint( 0, n, ndata)
data = (v==0)*(50+ np.random.randn(ndata)) \
+ (v==1)*(-50 + np.random.randn(ndata)) \
+ (v==2)*np.random.randn(ndata)
obs = mc.Normal('obs', mean, prec, value=data, observed = True)
model = mc.Model({'dd': dd,
'category': category,
'precs': precs,
'means': means,
'obs': obs})
The traces with the following sampling procedure look good as well. Solved!
mcmc = mc.MCMC( model )
mcmc.sample( 50000,0 )
mcmc.trace('means').gettrace()[-1,:]
there is a mc.Categorical object that does just this.
p = [0.2, 0.3, .5]
t = mc.Categorical('test', p )
t.random()
#array(2, dtype=int32)
It returns an int between 0 and len(p)-1. To model the 3 Normals, you make p a mc.Dirichlet object (it accepts a k length array as the hyperparameters; setting the values in the array to be the same is setting the prior probabilities to be equal). The rest of the model is nearly identical.
This is a generalization of the model I suggested above.
Update:
Okay, so instead of having different means, we can collapse them all into 1:
means = Normal( "means", 0, 0.001, size=3 )
...
#mc.deterministic
def mean(categorical=categorical, means = means):
return means[categorical]

Resources