Train GPT2 with Trainer & TrainingArguments using/specifying attention_mask - huggingface-transformers

I'm using Trainer & TrainingArguments to train GPT2 Model, but it seems that this does not work well.
My datasets have the ids of the tokens of my corpus and the mask of each text, to indicate where to apply the attention:
Dataset({
features: ['attention_mask', 'input_ids', 'labels'],
num_rows: 2012860
}))
I am doing the training with Trainer & TrainingArguments, passing my model and my previous dataset as follows. But nowhere do I specify anything about the attention_mask:
training_args = TrainingArguments(
output_dir=path_save_checkpoints,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
logging_steps = 5_000, save_steps=5_000,
fp16=True,
deepspeed="ds_config.json",
remove_unused_columns = True,
debug = True
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
tokenizer=tokenizer,
)
trainer.train()
How should I tell the Trainer to use this feature (attention_mask)?
If you take a look at the file /transformers/trainer.py there is no reference to "attention" or "mask".
Thanks in advance!

Somewhere in the source code, you will see that inputs are passed to the model something like this
outputs = model(**inputs)
As long as your collator returns a dictionary that includes the attention_mask key, your attention mask will be passed to your GPT2 model.

Related

User warning when I use more than one gpu with trainer function

I am doing classification text and for the training of the model I am using trainer function from HuggingFace, the code is:
def get_model(name_model):
model = AutoModelForSequenceClassification.from_pretrained(
name_model,
num_labels=2,
problem_type = "single_label_classification"
)
return model
model = get_model(name_model)
training_args = TrainingArguments(
learning_rate = 3e-5,
max_grad_norm = 1.0,
#weight_decay = 0.01,
num_train_epochs = 3,
per_device_train_batch_size = 32,
per_device_eval_batch_size = 1,
logging_steps = 300,
output_dir = "./training_output",
overwrite_output_dir = True,
seed =42,
fp16=True,
remove_unused_columns = False
)
trainer = Trainer(
model = model,
args = training_args,
train_dataset = train
)
trainer.args._n_gpu = 2
So, when it finish to train the model (which is BERT model) it says
I am afraid that the model is not correctly trained and that predictions that made are not okay.
Do you know how to fix this?, with only one gpu the are not warnings.
I tried to set fp16=True because I read in another forum that it could help, and I tried to set is_model_parallel= True but I didn't fix it. I tried too to set place_model_on_device = True too but did not work.

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

Uploading models with custom forward functions to the huggingface model hub?

Is it possible to upload a model with a custom forward function to the huggingface model hub?
I can see how to do it if your model is of a normal form but can't see how to customise the forward function and do it?
Yes absolutely. You can create your own model with added any number of layers/customisations you want and upload it to model hub. Let me present you a demo which will describe the entire process.
Uploading custom model to 🤗 model hub
import tqdm
from datasets import load_dataset
import transformers
from transformers import AutoTokenizer, AutoModel, BertConfig
from transformers import AdamW
from transformers import get_scheduler
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
# setting device to `cuda` if gpu exists
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
# initialising the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("google/bert_uncased_L-2_H-128_A-2")
bert = AutoModel.from_pretrained("google/bert_uncased_L-2_H-128_A-2")
def tokenize_function(examples):
'''Function for tokenizing raw texts'''
return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)
# downloading IMDB dataset from 🤗 `datasets`
raw_datasets = load_dataset("imdb")
# Running tokenizing function on the raw texts
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
# for simplicity I have taken only the train split
tokenized_datasets = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
# Now lets create the torch Dataset class
class IMDBClassificationDataset(Dataset):
def __init__(self, dataset):
self.dataset = dataset
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
d = self.dataset[idx]
ids = torch.tensor(d['input_ids'])
mask = torch.tensor(d['attention_mask'])
label = torch.tensor(d['label'])
return ids, mask, label
# Preparing the dataset and the Dataloader
dataset = IMDBClassificationDataset(tokenized_datasets)
train_dataloader = DataLoader(dataset, shuffle=True, batch_size=8)
# Now lets create a custom Bert model
class CustomBert(transformers.PreTrainedModel):
'''Custom model class
------------------
Now the trick is not to inherit the class from `nn.Module` but `transformers.PretrainedModel`
Also you need to pass the model config during initialisation'''
def __init__(self, bert):
super(CustomBert, self).__init__(config=BertConfig.from_pretrained('google/bert_uncased_L-2_H-128_A-2'))
self.bert = bert
self.l1 = nn.Linear(128, 1)
self.do = nn.Dropout(0.1)
self.relu = nn.ReLU()
self.sigmoid = nn.Sigmoid()
def forward(self, sent_id, mask):
'''For simplicity I have added only one linear layer, you can create any type of network you want'''
bert_out = self.bert(sent_id, attention_mask=mask)
o = bert_out.last_hidden_state[:,0,:]
o = self.do(o)
o = self.relu(o)
o = self.l1(o)
o = self.sigmoid(o)
return o
# initialising model, loss and optimizer
model = CustomBert(bert)
model.to(device)
criterion = torch.nn.BCELoss()
optimizer = AdamW(model.parameters(), lr=5e-5)
# setting epochs, num_training_steps and the lr_scheduler
num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(
"linear",
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=num_training_steps
)
# training loop
model.train()
for epoch in tqdm.tqdm(range(num_epochs)):
for batch in train_dataloader:
ids, masks, labels = batch
labels = labels.type(torch.float32)
o = model(ids.to(device), masks.to(device))
loss = criterion(torch.squeeze(o), labels.to(device))
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# save the tokenizer and the model in `./test-model/` directory
tokenizer.save_pretrained("./test-model/")
model.save_pretrained("./test-model/", push_to_hub=False)
Now create a new model in 🤗 and push all the contents inside the test-model to 🤗 model hub.
To test the authenticity of the model you can try 🤗's pipeline to check if something is wrong.
from transformers import pipeline
# as this is classification so you need to mention `text-classification` as task
classifier = pipeline('text-classification', model='tanmoyio/test-model')
classifier("This movie was superb")
It will output something like this
[{'label': 'LABEL_0', 'score': 0.5571992993354797}]
This is a real demo, check the model here - https://huggingface.co/tanmoyio/test-model. Let me know if you have further questions.

[XAI for transformer custom model using AllenNLP]

I have been solving the NER problem for a Vietnamese dataset with 15 tags in IO format. I have been using the AllenNLP Interpret Toolkit for my model, but I can not configure it completely.
I have used a pre-trained language model "xlm-roberta-base" based-on HuggingFace. I have concatenated 4 last bert layers, and pass through to linear layer. The model architecture you can see in the source below.
class BaseBertSoftmax(nn.Module):
def __init__(self, model, drop_out , num_labels):
super(BaseBertSoftmax, self).__init__()
self.num_labels = num_labels
self.model = model
self.dropout = nn.Dropout(drop_out)
self.classifier = nn.Linear(4*768, num_labels) # 4 last of layer
def forward_custom(self, input_ids, attention_mask=None,
labels=None, head_mask=None):
outputs = self.model(input_ids = input_ids, attention_mask=attention_mask)
sequence_output = torch.cat((outputs[1][-1], outputs[1][-2], outputs[1][-3], outputs[1][-4]),-1)
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output) # bsz, seq_len, num_labels
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
loss_fct = nn.CrossEntropyLoss(ignore_index=0)
if attention_mask is not None:
active_loss = attention_mask.view(-1) == 1
active_logits = logits.view(-1, self.num_labels)[active_loss]
active_labels = labels.view(-1)[active_loss]
loss = loss_fct(active_logits, active_labels)
else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs #scores, (hidden_states), (attentions)
What steps do I have to take to integrate this model to AllenNLP Interpret?
Could you please help me with this problem?

How do I prepend a sequential model to a pretrained model in Keras?

I want to put a 4-layer dense network in front of a pretrained model like nasnet_mobile. I have tried this several different ways, but they all give headaches (aka errors). What is the way to do this in keras+tensorflow2 that works?
Thoughts:
Is there some "flag" where I have to specify the output of the Dense as integer, or picture?
Is there some "flag" in the pretrained model where I have to allow it to connect?
Do I need to manually make a clone of the pretrained, load it with pretrained weights, and then try one of the above; perhaps the pretrained are a different class than the created? (update) If I'm copying, is there an easy way to make sure I get the structure the same so that when I have set_weights(get_weights(…)) it doesn't error?
None of the above...
CODE:
#LIBRARIES
import numpy as np
from tensorflow import keras
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Reshape, Conv2D, MaxPool2D , Flatten, Input
my_input_shape = (224,224,3)
#DENSE MODEL
my_inputs = Input(shape=my_input_shape)
hidden_1 = Dense(units=8, activation='relu')(my_inputs)
#make the output layer
hidden_2= Dense(units=np.product(my_input_shape),
activation='sigmoid')(hidden_1)
transformed = keras.layers.Reshape(my_input_shape,)(hidden_2)
dense_model = Model(inputs=my_inputs, outputs=transformed)
#PRETRAINED MODEL
pretrained_model = keras.applications.nasnet.NASNetMobile(weights = 'imagenet',
include_top = False,
input_shape=my_input_shape)
#Option 1
combined_model_1 = keras.applications.nasnet.NASNetMobile(weights = 'imagenet',
include_top = False,
input_tensor=transformed)
#Option 2
combined_model_2 = Model(inputs=dense_model.input, outputs=pretrained_model.output)
#Option 3a
combined_model_3a = keras.applications.nasnet.NASNetMobile(weights = 'imagenet',
include_top = False,
input_tensor=my_input_shape)(dense_model)
#Option 3b
combined_model_3b = keras.applications.nasnet.NASNetMobile(weights = 'imagenet',
include_top = False)(dense_model)
#Option 4
combined_model_4 = keras.applications.nasnet.NASNetMobile(weights = 'imagenet',
include_top = False,
input_tensor=dense_model)
Problem:
Given the above code, I want to daisy-chain the Dense model in front of the pretrained model. I want to feed an image into dense, have it propagate through dense, then be the input to the pretrained, and go through the pretrained.
Why not just do this:
inp = Input(shape=my_input_shape)
x = dense_model(inp)
x = pretrained_model(x)
final_model = Model(inp, x)

Resources