Manim Equation Transformation - transformation

Manim Community v0.15.1
class Equation_Transformation_Bug(Scene):
def construct(self):
equation_1 = MathTex("w", "\\times","v", "=", "1")
equation_1.shift(UP*2).scale(2)
equation_2 = MathTex("v", "=", "w^{-1}")
equation_2.scale(2)
equation_3 = MathTex("w", "\\times","w^{-1}", "=", "1")
equation_3.shift(UP*2).scale(2)
self.play(Write(equation_1), Write(equation_2))
self.wait(2)
self.play(FadeOut(equation_1[2]))
self.play(*[
Transform(
equation_2.get_part_by_tex("w^{-1}"),
equation_3.get_part_by_tex("w^{-1}")
)
] + [
Transform(
equation_1.get_part_by_tex(tex),
equation_3.get_part_by_tex(tex)
)
for tex in ("w", "\\times","=", "1")
])
self.wait(1)
I'm trying to get the w^{-1} from equation_2 to fly into the spot formerly occupied by v of equation_1 and transform into equation_3.
The "1" from equation_1, instead, transforms into the w^{-1} from equation_3.
I'm not trying to do a replacement transform.
How do I transform equation_1 into equation_3 and move the w^{-1} in the spot occupied by "v" of equation_1?

An approach using TransformMatchingShapes works reasonably well in this particular case:
class Eq(Scene):
def construct(self):
equation_1 = MathTex("w", "\\times","v", "=", "1")
equation_1.shift(UP*2).scale(2)
equation_2 = MathTex("v", "=", "w^{-1}")
equation_2.scale(2)
equation_3 = MathTex("w", "\\times","w^{-1}", "=", "1")
equation_3.shift(UP*2).scale(2)
self.play(Write(equation_1), Write(equation_2))
self.wait(2)
self.play(FadeOut(equation_1[2]))
self.play(
TransformMatchingShapes(
VGroup(equation_1[0:2], equation_1[3:], equation_2[2].copy()),
equation_3,
)
)
If you have shapes that would not match uniquely, take a look at the implementation of TransformMatchingShapes, there is a way to tweak what exactly gets transformed into what.

Related

RayTune HyperOptSearch - fitting resampling into pipeline throws error: All intermediate steps should be transformers and implement fit and transform

I'm getting started with Raytune and trying to set up a HyperOptSearch with imbalanced data.
Fitting a pipeline without RandomOverSampler works fine, but when I add that in, I get the error:
TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough'
Code sample here, and works fine without the RandomOverSampler step:
cfg_hgb = {
'clf__learning_rate' : tune.loguniform(0.001, 0.8),
'clf__max_leaf_nodes' : tune.randint(2,20),
'clf__min_samples_leaf' : tune.randint(50,500),
'clf__max_depth' : tune.randint(2,15),
'clf__l2_regularization' : tune.loguniform(0.001, 1000),
'clf__max_iter' : tune.choice([800]),
}
hyperopt = HyperOptSearch(
# space=cfg_hgb,
metric="mean_auc",
mode="max",
points_to_evaluate=None,
n_initial_points=20,
random_state_seed=RANDOM_STATE,
gamma=0.25,
)
def train_hgb(config):
# LOAD DATA
X, y, nominal, ordinal, numeric = load_clean_data()
# LOAD TRANSFORMERS
prep = Preprocessor(nominal, ordinal, numeric)
# CHOOSE CV STRATEGY
splitter = StratifiedKFold(CV, random_state=RANDOM_STATE, shuffle=True)
# TRAIN
scores = []
for train_ind, val_ind in splitter.split(X,y):
hgb_os = Pipeline(steps=[
('coltrans', prep.transformer_ord),
('ros', RandomOverSampler(random_state=RANDOM_STATE)), # if I comment out, works fine
('clf', HistGradientBoostingClassifier(
categorical_features=prep.cat_feature_mask,
random_state=RANDOM_STATE))
])
hgb_os.set_params(**config)
hgb_os.fit(X.iloc[train_ind], y[train_ind])
y_pred = hgb_os.predict(X.iloc[val_ind])
scores.append(roc_auc_score(y_true=y[val_ind], y_score=y_pred, average="macro"))
# REPORT SCORES
session.report({
'mean_auc' : np.array(scores).mean(),
'std_auc' : np.array(scores).std(),
})
tuner = tune.Tuner(
trainable=train_hgb,
param_space=cfg_hgb,
tune_config=tune.TuneConfig(
num_samples=10,
search_alg=hyperopt,
),
run_config=RunConfig(
name="experiment_name",
local_dir="./results/hgb",
)
)
results = tuner.fit()
Whereas if using ray.tune.sklearn.TuneSearchCV, RandomOverSampler works fine in the pipeline:
hgb_tune = {
'learning_rate' : tune.loguniform(0.001, 0.15),
'max_leaf_nodes' : tune.randint(2,4),
'min_samples_leaf' : tune.randint(160,300),
'max_depth' : tune.randint(2,7),
'l2_regularization' : tune.loguniform(5, 1000),
'max_iter' : tune.choice([400]),
}
hgb_os = Pipeline(steps=[
('trans', prep.transformer_ord),
('ros', RandomOverSampler(random_state=RANDOM_STATE)),
('clf', TuneSearchCV(
HistGradientBoostingClassifier(
categorical_features=prep.cat_feature_mask,
random_state=RANDOM_STATE),
param_distributions=hgb_tune,
cv=CV, scoring=SCORER,
verbose=VERBOSE, search_optimization="bayesian",
n_trials=N_TRIALS, )) # local_dir='~/rayresults/hgbtune'
])
results, params = fit_eval(hgb_os, X_train, X_test, y_train, y_test)
I understand that probably tune is expecting .fit_transform for intermediate steps whereas RandomOverSampler uses .fit_resample. Also note that RandomOverSampler requires imblearn.pipeline.Pipeline rather than sklearn.pipeline.Pipeline so perhaps therein lies the problem.
Is there a way to add any form of resampling with the current Tuner API? Or do I need to part out the pipeline and resample it first outside of this loop?
Thanks in advance.

Macro average (iterate over classes) in custom training loop - tensorflow

I am using a custom training loop. The task is a multi-label multi-class classification, i.e. I have multiple classes I want to predict and each class admits multiple labels. loss has dimensions batch_size, no_classes, as said before each col in no_classes is a multi-label classification task. The following code works when #tf.function is commented out, however once graph mode is on, this is not working since iterating over tensor is not allowed in graph mode. Would anyone be able to suggest how I can rewrite the code below so that it works in graph mode?
items_loss_list = []
for item in range(loss.shape[1]):
values, _ = tf.unique(y[:, item])
item_macro_average = tf.reduce_mean(
[
tf.reduce_mean(
tf.gather_nd(
loss[:, item],
indices=tf.cast(tf.where(y[:, item] == v), tf.int32),
)
)
for v in values
]
)
items_loss_list.append(item_macro_average)
I also tried:
i = tf.constant(0)
while_condition = lambda i: tf.less(i, len(values))
item_score_avg = []
def body(i):
item_score_avg.append(
tf.reduce_mean(
tf.gather_nd(
loss[:, item],
indices=tf.cast(tf.where(y[:, item] == values[i]), tf.int32),
)
)
)
return [tf.add(i, 1)]
tf.while_loop(while_condition, body, [i])
items_loss_list.append(tf.reduce_mean(item_score_avg))
But this is not working either in graph mode. Thank you for your help!
Apparently map_fn solves the problem. items_macro_average is a list collecting the macro average loss per task.
items_macro_average = []
for item in range(loss.shape[1]):
values, _ = tf.unique(y[:, item])
item_macro_average = tf.reduce_mean(
tf.map_fn(
fn=lambda v: tf.reduce_mean(
tf.gather_nd(
loss[:, item],
indices=tf.cast(tf.where(y[:, item] == v), tf.int32),
)
),
elems=values,
)
)
items_macro_average.append(item_macro_average)

HyperOptSearch and ray.tune

I'm trying to do parameter optimisation with HyperOptSearch and ray.tune. The code works with hyperopt (without tune) but I wanted it to be faster and therefore use tune. Unfortunately I could not find many examples, so I am not sure about the code. I use a pipeline with XGboost but do not just want to optimise the parameters in XGboost but also another parameter in the pipeline that is for the encoding. Is this possible to do with tune? My code is below.
from hyperopt import hp
from ray import tune
from ray.tune.suggest.hyperopt import HyperOptSearch
def train_model(space, reporter):
#target encoding
columns_te=no_of_classes[no_of_classes.counts>space['enc_threshold']].feature.values.tolist()
#one hot encoding
columns_ohe=categorical.columns[~categorical.columns.isin(cols_te)].tolist()
#model
pipe1 = SKPipeline([('ohe',
OneHotEncoder(cols=columns_ohe, return_df=True,handle_unknown='ignore', use_cat_names=True)),
('te',
TargetEncoder(cols= columns_te, min_samples_leaf=space['min_samples_leaf']))])
pipe2 = IMBPipeline([
('sampling',RandomUnderSampler()),
('clf', xgb.XGBClassifier(**space, n_jobs = -1))
])
model = SKPipeline([('pipe1', pipe1), ('pipe2', pipe2)])
optimizer = SGD()
dataset = xx
accuracy = model.fit(dataset.drop(['yy']), dataset.yy)
reporter(mean_accuracy=roc_auc)
if __name__ == '__main__':
ray.init()
space = {'eta':hp.uniform('eta',0.001,0.1),
'max_depth':scope.int(hp.quniform('max_depth', 1,5,1)),
'min_child_weight': hp.uniform('min_child_weight', 0.1, 1.5),
'n_estimators': scope.int(hp.quniform('n_estimators',20,200,10)),
'subsample': hp.uniform('subsample',0.5,0.9),
'colsample_bytree': hp.uniform('colsample_bytree',0.5,0.9),
'gamma': hp.uniform('gamma',0,5),
'min_samples_leaf':scope.int(hp.quniform('min_samples_leaf',10,200,20)),
'nrounds':scope.int(hp.quniform('nrounds',100,1500,50))
}
algo = HyperOptSearch(space, max_concurrent=5, metric='roc_auc', mode="max")
tune.run(train_model, num_samples=10, search_alg=algo)

What is the difference between gensim LabeledSentence and TaggedDocument

Please help me in understanding the difference between how TaggedDocument and LabeledSentence of gensim works. My ultimate goal is Text Classification using Doc2Vec model and any classifier. I am following this blog!
class MyLabeledSentences(object):
def __init__(self, dirname, dataDct={}, sentList=[]):
self.dirname = dirname
self.dataDct = {}
self.sentList = []
def ToArray(self):
for fname in os.listdir(self.dirname):
with open(os.path.join(self.dirname, fname)) as fin:
for item_no, sentence in enumerate(fin):
self.sentList.append(LabeledSentence([w for w in sentence.lower().split() if w in stopwords.words('english')], [fname.split('.')[0].strip() + '_%s' % item_no]))
return sentList
class MyTaggedDocument(object):
def __init__(self, dirname, dataDct={}, sentList=[]):
self.dirname = dirname
self.dataDct = {}
self.sentList = []
def ToArray(self):
for fname in os.listdir(self.dirname):
with open(os.path.join(self.dirname, fname)) as fin:
for item_no, sentence in enumerate(fin):
self.sentList.append(TaggedDocument([w for w in sentence.lower().split() if w in stopwords.words('english')], [fname.split('.')[0].strip() + '_%s' % item_no]))
return sentList
sentences = MyLabeledSentences(some_dir_name)
model_l = Doc2Vec(min_count=1, window=10, size=300, sample=1e-4, negative=5, workers=7)
sentences_l = sentences.ToArray()
model_l.build_vocab(sentences_l )
for epoch in range(15): #
random.shuffle(sentences_l )
model.train(sentences_l )
model.alpha -= 0.002 # decrease the learning rate
model.min_alpha = model_l.alpha
sentences = MyTaggedDocument(some_dir_name)
model_t = Doc2Vec(min_count=1, window=10, size=300, sample=1e-4, negative=5, workers=7)
sentences_t = sentences.ToArray()
model_l.build_vocab(sentences_t)
for epoch in range(15): #
random.shuffle(sentences_t)
model.train(sentences_t)
model.alpha -= 0.002 # decrease the learning rate
model.min_alpha = model_l.alpha
My question is model_l.docvecs['some_word'] is same as model_t.docvecs['some_word']?
Can you provide me weblink of good sources to get a grasp on how TaggedDocument or LabeledSentence works.
LabeledSentence is an older, deprecated name for the same simple object-type to encapsulate a text-example that is now called TaggedDocument. Any objects that have words and tags properties, each a list, will do. (words is always a list of strings; tags can be a mix of integers and strings, but in the common and most-efficient case, is just a list with a single id integer, starting at 0.)
model_l and model_t will serve the same purposes, having trained on the same data with the same parameters, using just different names for the objects. But the vectors they'll return for individual word-tokens (model['some_word']) or document-tags (model.docvecs['somefilename_NN']) will likely be different – there's randomness in Word2Vec/Doc2Vec initialization and training-sampling, and introduced by ordering-jitter from multithreaded training.

Using a complex likelihood in PyMC3

pymc.__version__ = '3.0'
theano.__version__ = '0.6.0.dev-RELEASE'
I'm trying to use PyMC3 with a complex likelihood function:
First question: Is this possible?
Here's my attempt using Thomas Wiecki's post as a guide:
import numpy as np
import theano as th
import pymc as pm
import scipy as sp
# Actual data I'm trying to fit
x = np.array([52.08, 58.44, 60.0, 65.0, 65.10, 66.0, 70.0, 87.5, 110.0, 126.0])
y = np.array([0.522, 0.659, 0.462, 0.720, 0.609, 0.696, 0.667, 0.870, 0.889, 0.919])
yerr = np.array([0.104, 0.071, 0.138, 0.035, 0.102, 0.096, 0.136, 0.031, 0.024, 0.035])
th.config.compute_test_value = 'off'
a = th.tensor.dscalar('a')
with pm.Model() as model:
# Priors
alpha = pm.Normal('alpha', mu=0.3, sd=5)
sig_alpha = pm.Normal('sig_alpha', mu=0.03, sd=5)
t_double = pm.Normal('t_double', mu=4, sd=20)
t_delay = pm.Normal('t_delay', mu=21, sd=20)
nu = pm.Uniform('nu', lower=0, upper=20)
# Some functions needed for calculation of the y estimator
def T(eqd):
doses = np.array([52.08, 58.44, 60.0, 65.0, 65.10,
66.0, 70.0, 87.5, 110.0, 126.0])
tmt_times = np.array([29,29,43,29,36,48,22,11,7,8])
return np.interp(eqd, doses, tmt_times)
def TCP(a):
time = T(x)
BCP = pm.exp(-1E7*pm.exp(-alpha*x*1.2 + 0.69315/t_delay(time-t_double)))
return pm.prod(BCP)
def normpdf(a, alpha, sig_alpha):
return 1./(sig_alpha*pm.sqrt(2.*np.pi))*pm.exp(-pm.sqr(a-alpha)/(2*pm.sqr(sig_alpha)))
def normcdf(a, alpha, sig_alpha):
return 1./2.*(1+pm.erf((a-alpha)/(sig_alpha*pm.sqrt(2))))
def integrand(a):
return normpdf(a,alpha,sig_alpha)/(1.-normcdf(0,alpha,sig_alpha))*TCP(a)
func = th.function([a,alpha,sig_alpha,t_double,t_delay], integrand(a))
y_est = sp.integrate.quad(func(a, alpha, sig_alpha,t_double,t_delay), 0, np.inf)[0]
likelihood = pm.T('TCP', mu=y_est, nu=nu, observed=y_tcp)
start = pm.find_MAP()
step = pm.NUTS(state=start)
trace = pm.sample(2000, step, start=start, progressbar=True)
which produces the following message regarding the expression for y_est:
TypeError: ('Bad input argument to theano function with name ":42" at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')
I've overcome various other hurdles to get this far, and this is where I'm stuck. So, provided the answer to my first question is 'yes', then am I on the right track? Any guidance would be helpful!
N.B. Here is a similar question I found, and another.
Disclaimer: I'm very new at this. My only previous experience is successfully reproducing the linear regression example in Thomas' post. I've also successfully run the Theano test suite, so I know it works.
Yes, its possible to make something with a complex or arbitrary likelihood. Though that doesn't seem like what you're doing here. It looks like you have a complex transformation of one variable into another, the integration step.
Your particular exception is that integrate.quad is expecting a numpy array, not a pymc Variable. If you want to do quad within pymc, you'll have to make a custom theano Op (with derivative) for it.

Resources