How to calculate the accuracy of a custom made bert model - huggingface-transformers

I'm still new to Bert and other deep learning aspects. How can I calculate the accuracy of this model.
This is the code for train the model
x = {"input_ids": encoded_texts["input_ids"], "token_type_ids": encoded_texts["token_type_ids"], "attention_mask": encoded_texts["attention_mask"]}
history = model.fit(
x, (encoded_slots, encoded_intents), epochs=4, batch_size=16, shuffle=True,validation_split=0.2)
And when I try to test it, it result me a tuple. I have no idea how to proceed further on this
y = {"input_ids": encoded_texts["input_ids"], "token_type_ids": encoded_texts["token_type_ids"],
"attention_mask": encoded_texts["attention_mask"]}
result = model.predict(y)
print(result)
(array([[[ 8.06351662e+00, -1.32801700e+00, -4.86241132e-01, ...,
-7.85976648e-01, -2.50020772e-01, 1.05790246e+00],
[ 6.22850323e+00, -8.28452349e-01, -5.93462493e-03, ...,
-6.97286367e-01, 1.13765049e+00, 1.02008772e+00],
[ 6.02462339e+00, -8.69148612e-01, -8.80611539e-02, ...,
-8.40042010e-02, 7.15953529e-01, 5.95877230e-01],
...,
[ 8.01505756e+00, -1.08957875e+00, 2.24328581e-02, ...,
-1.28074026e+00, -9.73342732e-02, 6.63778067e-01],
[ 8.12351418e+00, -6.43307209e-01, -2.73164481e-01, ...,
-7.59647727e-01, 6.21648252e-01, 6.72666132e-01],
[ 8.08487988e+00, -7.82105029e-01, -2.72888869e-01, ...,
-8.51698875e-01, 4.32829887e-01, 6.83589280e-01]],
[[ 6.45053625e+00, -9.56276298e-01, -3.08830112e-01, ...,
-6.64760232e-01, -7.55424276e-02, 1.36750138e+00],
[ 6.22469139e+00, -6.72929108e-01, -5.13188660e-01, ...,
5.48318587e-02, 1.66367337e-01, 1.10040116e+00],
[ 5.60579491e+00, -1.08946371e+00, 2.23633960e-01, ...,
4.21536475e-01, -1.19417921e-01, 1.21391022e+00],
...,
How can I calculate the accuracy out of this?

Related

Problem with following along with notebook on kaggle "max() received an invalid combination of arguments" issue

For my studying purposes I am following along a very popular notebook for sentiment classification with Bert.
Kaggle notebook for sentiment classification with BERT
But in place of train the model like in notebook, i just load another model
MODEL_NAME = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
and want to test this on my data, to get a heatmap and accuracy score likde on the end of this notebook.
But when i am at the step of evalution i get
TypeError: max() received an invalid combination of arguments - got (SequenceClassifierOutput, dim=int), but expected one of:
* (Tensor input)
* (Tensor input, Tensor other, *, Tensor out)
* (Tensor input, int dim, bool keepdim, *, tuple of Tensors out)
* (Tensor input, name dim, bool keepdim, *, tuple of Tensors out)
in evaluation function where it says
_, preds = torch.max(outputs, dim=1)
I tried to change this to
_, preds = torch.max(torch.tensor(outputs), dim=1)
But then a got another issue:
RuntimeError: Could not infer dtype of SequenceClassifierOutput
the method for evaluation looks like this:
def eval_model(model, data_loader, loss_fn, device, n_examples):
model = model.eval()
losses = []
correct_predictions = 0
with torch.no_grad():
for d in data_loader:
input_ids = d["input_ids"].to(device)
attention_mask = d["attention_mask"].to(device)
targets = d["targets"].to(device)
# Get model ouptuts
outputs = model(
input_ids=input_ids,
attention_mask=attention_mask,
)
_, preds = torch.max(outputs, dim=1)
loss = loss_fn(outputs, targets)
correct_predictions += torch.sum(preds == targets)
losses.append(loss.item())
return correct_predictions.double() / n_examples, np.mean(losses)
And outputs it self in the code above looks like this
SequenceClassifierOutput(loss=None, logits=tensor([[ 2.2241, 1.2025, 0.1638, -1.4620, -1.6424],
[ 3.1578, 1.3957, -0.1131, -1.8141, -1.9536],
[ 0.7273, 1.7851, 1.1237, -0.9063, -2.3822],
[ 0.9843, 0.9711, 0.5067, -0.7553, -1.4547],
[-0.4127, -0.8895, 0.0572, 0.3550, 0.7377],
[-0.4885, 0.6933, 0.8272, -0.3176, -0.7546],
[ 1.3953, 1.4224, 0.7842, -0.9143, -2.2898],
[-2.4618, -1.2675, 0.5480, 1.4326, 1.2893],
[ 2.5044, 0.9191, -0.1483, -1.4413, -1.4156],
[ 1.3901, 1.0331, 0.4259, -0.8006, -1.6999],
[ 4.2252, 2.6539, -0.0392, -2.6362, -3.3261],
[ 1.9750, 1.8845, 0.6779, -1.3163, -2.5570],
[ 5.1688, 2.2360, -0.6230, -2.9657, -2.9031],
[ 1.1857, 0.4277, -0.1837, -0.7163, -0.6682],
[ 2.1133, 1.3829, 0.5750, -1.3095, -2.2234],
[ 2.3258, 0.9406, -0.0115, -1.1673, -1.6775]], device='cuda:0'), hidden_states=None, attentions=None)
How i can make it work?
Kind regards

RobustScaler in PySpark

I would like to use a RobustScaler for preprocessing data. In sklearn it can be found in
sklearn.preprocessing.RobustScaler
. However, I am using pyspark, so I tried to import it with:
from pyspark.ml.feature import RobustScaler
However, I receive the following error:
ImportError: cannot import name 'RobustScaler' from 'pyspark.ml.feature'
As pault pointed out, RobustScaler is implemented only in pyspark 3. I am trying to implement it as:
class PySpark_RobustScaler(Pipeline):
def __init__(self):
pass
def fit(self, df):
return self
def transform(self, df):
self._df = df
for col_name in self._df.columns:
q1, q2, q3 = self._df.approxQuantile(col_name, [0.25, 0.5, 0.75], 0.00)
self._df = self._df.withColumn(col_name, 2.0*(sf.col(col_name)-q2)/(q3-q1))
return self._df
arr = np.array(
[[ 1., -2., 2.],
[ -2., 1., 3.],
[ 4., 1., -2.]]
)
rdd1 = sc.parallelize(arr)
rdd2 = rdd1.map(lambda x: [int(i) for i in x])
df_sprk = rdd2.toDF(["A", "B", "C"])
df_pd = pd.DataFrame(arr, columns=list('ABC'))
PySpark_RobustScaler().fit(df_sprk).transform(df_sprk).show()
print(RobustScaler().fit(df_pd).transform(df_pd))
However I have found that to obtain the same result of sklearn I have to multiply the result by 2. Furthermore, I am worried that if a column has many values close to zero, the interquartile range q3-q1 could become too small and let the result diverge, creating null values.
Does anyone have any suggestions on how to improve it?
This feature has been released in recent pyspark versions.

Discrepancies in gensim doc2vec embedding vectors

I use gensim Doc2Vec package to train doc2vec embeddings. I would expect that two models trained with the identical parameters and data would have very close values of the doc2vec vectors. However, in my experience it is only true with doc2vec trained in the PV-DBOW without training word embedding (dbow_words = 0).
For PV-DM and for PV-DBOW with dbow_words = 1, i.e. every case the word embedding are trained along with doc2vec, the doc2vec embedding vectors for identically trained models are fairly different.
Here is my code
from sklearn.datasets import fetch_20newsgroups
from gensim import models
import scipy.spatial.distance as distance
import numpy as np
from nltk.corpus import stopwords
from string import punctuation
def clean_text(texts, min_length = 2):
clean = []
#don't remove apostrophes
translator = str.maketrans(punctuation.replace('\'',' '), ' '*len(punctuation))
for text in texts:
text = text.translate(translator)
tokens = text.split()
# remove not alphabetic tokens
tokens = [word.lower() for word in tokens if word.isalpha()]
# filter out stop words
stop_words = stopwords.words('english')
tokens = [w for w in tokens if not w in stop_words]
# filter out short tokens
tokens = [word for word in tokens if len(word) >= min_length]
tokens = ' '.join(tokens)
clean.append(tokens)
return clean
def tag_text(all_text, tag_type =''):
tagged_text = []
for i, text in enumerate(all_text):
tag = tag_type + '_' + str(i)
tagged_text.append(models.doc2vec.TaggedDocument(text.split(), [tag]))
return tagged_text
def train_docvec(dm, dbow_words, min_count, epochs, training_data):
model = models.Doc2Vec(dm=dm, dbow_words = dbow_words, min_count = min_count)
model.build_vocab(tagged_data)
model.train(training_data, total_examples=len(training_data), epochs=epochs)
return model
def compare_vectors(vector1, vector2):
cos_distances = []
for i in range(len(vector1)):
d = distance.cosine(vector1[i], vector2[i])
cos_distances.append(d)
print (np.median(cos_distances))
print (np.std(cos_distances))
dataset = fetch_20newsgroups(shuffle=True, random_state=1,remove=('headers', 'footers', 'quotes'))
n_samples = len(dataset.data)
data = clean_text(dataset.data)
tagged_data = tag_text(data)
data_labels = dataset.target
data_label_names = dataset.target_names
model_dbow1 = train_docvec(0, 0, 4, 30, tagged_data)
model_dbow2 = train_docvec(0, 0, 4, 30, tagged_data)
model_dbow3 = train_docvec(0, 1, 4, 30, tagged_data)
model_dbow4 = train_docvec(0, 1, 4, 30, tagged_data)
model_dm1 = train_docvec(1, 0, 4, 30, tagged_data)
model_dm2 = train_docvec(1, 0, 4, 30, tagged_data)
compare_vectors(model_dbow1.docvecs, model_dbow2.docvecs)
> 0.07795828580856323
> 0.02610614028793008
compare_vectors(model_dbow1.docvecs, model_dbow3.docvecs)
> 0.6476179957389832
> 0.14797587172616306
compare_vectors(model_dbow3.docvecs, model_dbow4.docvecs)
> 0.19878000020980835
> 0.06362519480831186
compare_vectors(model_dm1.docvecs, model_dm2.docvecs)
> 0.13536489009857178
> 0.045365127475424386
compare_vectors(model_dbow1.docvecs, model_dm1.docvecs)
> 0.6358324736356735
> 0.15150255674571805
UPDATE
I tried, as suggested by gojomo, to compare the differences between the vectors, and, unfortunately, those are even worse:
def compare_vector_differences(vector1, vector2):
diff1 = []
diff2 = []
for i in range(len(vector1)-1):
diff1.append( vector1[i+1] - vector1[i])
for i in range(len(vector2)-1):
diff2[i].append(vector2[i+1] - vector2[i])
cos_distances = []
for i in range(len(diff1)):
d = distance.cosine(diff1[i], diff2[i])
cos_distances.append(d)
print (np.median(cos_distances))
print (np.std(cos_distances))
compare_vector_differences(model_dbow1.docvecs, model_dbow2.docvecs)
> 0.1134452223777771
> 0.02676398444178949
compare_vector_differences(model_dbow1.docvecs, model_dbow3.docvecs)
> 0.8464127033948898
> 0.11423789350773429
compare_vector_differences(model_dbow4.docvecs, model_dbow3.docvecs)
> 0.27400463819503784
> 0.05984108730423529
SECOND UPDATE
This time, after I finally understood gojomo, the things look fine.
def compare_distance_differences(vector1, vector2):
diff1 = []
diff2 = []
for i in range(len(vector1)-1):
diff1.append( distance.cosine(vector1[i+1], vector1[i]))
for i in range(len(vector2)-1):
diff2.append( distance.cosine(vector2[i+1], vector2[i]))
diff_distances = []
for i in range(len(diff1)):
diff_distances.append(abs(diff1[i] - diff2[i]))
print (np.median(diff_distances))
print (np.std(diff_distances))
compare_distance_differences(model_dbow1.docvecs, model_dbow2.docvecs)
>0.017469733953475952
>0.01659284710785352
compare_distance_differences(model_dbow1.docvecs, model_dbow3.docvecs)
>0.0786697268486023
>0.06092163158218411
compare_distance_differences(model_dbow3.docvecs, model_dbow4.docvecs)
>0.02321992814540863
>0.023095123172320778
The doc-vectors (or word-vectors) of Doc2Vec & Word2Vec models are only meaningfully comparable to other vectors that were co-trained, in the same interleaved training sessions.
Otherwise, randomness introduced by the algorithms (random-initialization & random-sampling) and by slight differences in training ordering (from multithreading) will cause the trained positions of individual vectors to wander to arbitrarily different positions. Their relative distances/directions, to other vectors that shared interleaved training, should be about as equally-useful from one model to the next.
But there's no one right place for such a vector, and measuring the differences between the vector for document '1' (or word 'foo') in one model, and the corresponding vector in another model, isn't reflective of anything the models/algorithms are trained to provide.
There's more information in the Gensim FAQ:
Q11: I've trained my Word2Vec/Doc2Vec/etc model repeatedly using the exact same text corpus, but the vectors are different each time. Is there a bug or have I made a mistake?

How to do parallel processing in pytorch

I am working on a deep learning problem. I am solving it using pytorch. I have two GPU's which are on the same machine (16273MiB,12193MiB). I want to use both the GPU's for my training (video dataset).
I get a warning:
There is an imbalance between your GPUs. You may want to exclude GPU 1 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
I also get an error:
raise TypeError('Broadcast function not implemented for CPU tensors')
TypeError: Broadcast function not implemented for CPU tensors
if __name__ == '__main__':
opt.scales = [opt.initial_scale]
for i in range(1, opt.n_scales):
opt.scales.append(opt.scales[-1] * opt.scale_step)
opt.arch = '{}-{}'.format(opt.model, opt.model_depth)
opt.mean = get_mean(opt.norm_value)
opt.std = get_std(opt.norm_value)
print("opt",opt)
with open(os.path.join(opt.result_path, 'opts.json'), 'w') as opt_file:
json.dump(vars(opt), opt_file)
torch.manual_seed(opt.manual_seed)
model, parameters = generate_model(opt)
#print(model)
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("Total number of trainable parameters: ", pytorch_total_params)
# Define Class weights
if opt.weighted:
print("Weighted Loss is created")
if opt.n_finetune_classes == 2:
weight = torch.tensor([1.0, 3.0])
else:
weight = torch.ones(opt.n_finetune_classes)
else:
weight = None
criterion = nn.CrossEntropyLoss()
if not opt.no_cuda:
criterion = nn.DataParallel(criterion.cuda())
if opt.no_mean_norm and not opt.std_norm:
norm_method = Normalize([0, 0, 0], [1, 1, 1])
elif not opt.std_norm:
norm_method = Normalize(opt.mean, [1, 1, 1])
else:
norm_method = Normalize(opt.mean, opt.std)
train_loader = torch.utils.data.DataLoader(
training_data,
batch_size=opt.batch_size,
shuffle=True,
num_workers=opt.n_threads,
pin_memory=True)
train_logger = Logger(
os.path.join(opt.result_path, 'train.log'),
['epoch', 'loss', 'acc', 'precision','recall','lr'])
train_batch_logger = Logger(
os.path.join(opt.result_path, 'train_batch.log'),
['epoch', 'batch', 'iter', 'loss', 'acc', 'precision', 'recall', 'lr'])
if opt.nesterov:
dampening = 0
else:
dampening = opt.dampening
optimizer = optim.SGD(
parameters,
lr=opt.learning_rate,
momentum=opt.momentum,
dampening=dampening,
weight_decay=opt.weight_decay,
nesterov=opt.nesterov)
# scheduler = lr_scheduler.ReduceLROnPlateau(
# optimizer, 'min', patience=opt.lr_patience)
if not opt.no_val:
spatial_transform = Compose([
Scale(opt.sample_size),
CenterCrop(opt.sample_size),
ToTensor(opt.norm_value), norm_method
])
print('run')
for i in range(opt.begin_epoch, opt.n_epochs + 1):
if not opt.no_train:
adjust_learning_rate(optimizer, i, opt.lr_steps)
train_epoch(i, train_loader, model, criterion, optimizer, opt,
train_logger, train_batch_logger)
I have also made changes in my train file:
model = nn.DataParallel(model(),device_ids=[0,1]).cuda()
outputs = model(inputs)
It does not seem to work properly and is giving error. Please advice, I am new to pytorch.
Thanks
As mentioned in this link, you have to do model.cuda() before passing it to nn.DataParallel.
net = nn.DataParallel(model.cuda(), device_ids=[0,1])
https://github.com/pytorch/pytorch/issues/17065

Dirichlet vs Binomial in pymc3

I am having trouble sampling from a Dirichlet/Multinomial distribution with pymc3.
I tried to create a simple test-case to recreate a Beta/Binomial using Dirichlet/Multinomial with n=2, but I can't get it to work.
Below I have some code that works for Binomial but fails for Multinomial.
One of the obvious differences is that the Multinomial model is more constrained:
i.e. to start, rating is set to 10 in the Binomial model, and [10,10] in the Multinomial.
The pymc3 Dirichlet code does say "Only the first k-1 elements of x are expected" but only arrays of shape 2 seem to work in my code.
The output shows that num_friends and rating are being sampled in the Binomial case, but not in the Multinomial case. friends_ratings is being sampled in both. Thanks!
Oh, also Dirichlet('d', np.array([1,1])) crashes with "Floating point error 8". It only appears to fail when two integers of value 1 are passed in. np.array([1.,1.]) works.
import pymc as pm
import numpy as np
print "TEST BINOMIAL"
with pm.Model() as model:
friends_ratings = pm.Beta('friends_ratings', alpha=1, beta=2)
num_friends = pm.DiscreteUniform('num_friends', lower=0, upper=100)
rating = pm.Binomial('rating', n=num_friends, p=friends_ratings)
step = pm.Metropolis([num_friends, friends_ratings, rating])
start = {"friends_ratings":.5, "num_friends":20, 'rating':10}
tr = pm.sample(5, step, start=start, progressbar=False)
print "friends", [tr[i]['num_friends'] for i in range(len(tr))]
print "friends_ratings", [tr[i]['friends_ratings'] for i in range(len(tr))]
print "rating", [tr[i]['rating'] for i in range(len(tr))]
print "TEST DIRICHLET"
with pm.Model() as model:
friends_ratings = pm.Dirichlet('friends_ratings', np.array([1.,1.]), shape=2)
num_friends = pm.DiscreteUniform('num_friends', lower=0, upper=100)
rating = pm.Multinomial('rating', n=num_friends, p=friends_ratings, shape=2)
step = pm.Metropolis([num_friends, friends_ratings, rating])
start = {'friends_ratings': np.array([0.5,0.5]), 'num_friends': 20, 'rating': [10,10]}
tr = pm.sample(5, step, start=start, progressbar=False)
print "friends", [tr[i]['num_friends'] for i in range(len(tr))]
print "friends_ratings", [tr[i]['friends_ratings'] for i in range(len(tr))]
print "rating", [tr[i]['rating'] for i in range(len(tr))]
Output:
TEST BINOMIAL
friends [22.0, 24.0, 24.0, 23.0, 23.0]
friends_ratings [0.5, 0.5, 0.41, 0.41, 0.41]
ratingf [10.0, 11.0, 11.0, 11.0, 11.0]
TEST DIRICHLET
friends [20.0, 20.0, 20.0, 20.0, 20.0]
friends_ratings [array([ 0.51369621, 1.490608 ]), ... ]
rating [array([ 10., 10.]), array([ 10., 10.]), ... ]
PyMC3 does not automatically normalize the Dirichlet. So far you have to do this explicitly using simplextransform. See here for an example.
There is an issue of making this transform automatic though: https://github.com/pymc-devs/pymc3/issues/315
EDIT (9/14/2015): PyMC3 now automatically transforms the dirichlet distribution (as any other distribution). So you don't need to specify that manually anymore.

Resources