User warning when I use more than one gpu with trainer function - huggingface-transformers

I am doing classification text and for the training of the model I am using trainer function from HuggingFace, the code is:
def get_model(name_model):
model = AutoModelForSequenceClassification.from_pretrained(
name_model,
num_labels=2,
problem_type = "single_label_classification"
)
return model
model = get_model(name_model)
training_args = TrainingArguments(
learning_rate = 3e-5,
max_grad_norm = 1.0,
#weight_decay = 0.01,
num_train_epochs = 3,
per_device_train_batch_size = 32,
per_device_eval_batch_size = 1,
logging_steps = 300,
output_dir = "./training_output",
overwrite_output_dir = True,
seed =42,
fp16=True,
remove_unused_columns = False
)
trainer = Trainer(
model = model,
args = training_args,
train_dataset = train
)
trainer.args._n_gpu = 2
So, when it finish to train the model (which is BERT model) it says
I am afraid that the model is not correctly trained and that predictions that made are not okay.
Do you know how to fix this?, with only one gpu the are not warnings.
I tried to set fp16=True because I read in another forum that it could help, and I tried to set is_model_parallel= True but I didn't fix it. I tried too to set place_model_on_device = True too but did not work.

Related

Is "insample" in mlr3tuning resampling can be used when we want to do hyperparameter tuning with the full dataset?

I've been trying to do some tuning hyperparameters for the survival SVM model. I used the AutoTuner function from the mlr3tuning package. I want to do tuning for the whole dataset (No train & test split). I've found the resampling class which is "insample". When I look at the mlr3 dictionary, it said "Uses all observations as training and as test set."
My questions is, Is "insample" in mlr3tuning resampling can be used when we want to do hyperparameter tuning with the full dataset and if it applies, why when I tried to use the hyperparameter to the survivalsvm function from the survivalsvm package, it gives the different output of concordance index?
This is the code I used for hyperparameter tuning
veteran<-veteran
set.seed(1)
task = as_task_surv(x = veteran, time = 'time', event = 'status')
learner = lrn("surv.svm", type = "hybrid", diff.meth = "makediff3",
gamma.mu = c(0.1, 0.1),kernel = 'rbf_kernel')
search_space = ps(gamma = p_dbl(2^-5, 2^5),mu = p_dbl(2^-5, 2^5))
search_space$trafo = function(x, param_set) {
x$gamma.mu = c(x$gamma, x$mu)
x$gamma = x$mu = NULL
x}
ssvm_at = AutoTuner$new(
learner = learner,
resampling = rsmp("insample"),
search_space = search_space,
measure = msr('surv.cindex'),
terminator = trm('evals', n_evals = 5),
tuner = tnr('grid_search'))
ssvm_at$train(task)
And this is the code that I've been trying using the survivalsvm function from the survivalsvm package
survsvm.reg <- survivalsvm(Surv(veteran$time , veteran$status ) ~ .,
data = veteran,
type = "hybrid", gamma.mu = c(32,32),diff.meth = "makediff3",
opt.meth = "quadprog", kernel = "rbf_kernel")
pred.survsvm.reg <- predict(survsvm.reg,veteran)
conindex(pred.survsvm.reg, veteran$time)

Fine-tune custom pre-trained language model

I am new to fine-tuning transformer models. I am trying to fine-tune this model on my dataset but i got an error. The code:
distil_bert = 'Chramer/remote-sensing-distilbert-cased'
config = transformers.BertConfig(dropout=0.2, attention_dropout=0.2)
config.output_hidden_states=False
transformer_model=transformers.TFBertModel.from_pretrained(distil_bert,config=config, from_pt=True)
input_ids = tf.keras.layers.Input(shape=(128,), name='input_ids', dtype='int32')
attention_mask = tf.keras.layers.Input(shape=(128,), name='attention_mask',
dtype='int32')
input_segments = tf.keras.layers.Input(shape=(128,), name='input_segments',
dtype='int32')
embedding_layer = transformer_model(input_ids, attention_mask,input_segments)[0]
X = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True,
dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = tf.keras.layers.GlobalMaxPool1D()(X)
X = tf.keras.layers.Dense(50, activation='relu')(X)
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.Dense(1, activation='sigmoid')(X)
model = tf.keras.Model(inputs={'input_ids': input_ids,'attention_mask':
attention_mask,'input_segments': input_segments}, outputs=X)
I got this error:
enter image description here
I will be thankful if anyone could help. I am using Tensorflow.

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

[XAI for transformer custom model using AllenNLP]

I have been solving the NER problem for a Vietnamese dataset with 15 tags in IO format. I have been using the AllenNLP Interpret Toolkit for my model, but I can not configure it completely.
I have used a pre-trained language model "xlm-roberta-base" based-on HuggingFace. I have concatenated 4 last bert layers, and pass through to linear layer. The model architecture you can see in the source below.
class BaseBertSoftmax(nn.Module):
def __init__(self, model, drop_out , num_labels):
super(BaseBertSoftmax, self).__init__()
self.num_labels = num_labels
self.model = model
self.dropout = nn.Dropout(drop_out)
self.classifier = nn.Linear(4*768, num_labels) # 4 last of layer
def forward_custom(self, input_ids, attention_mask=None,
labels=None, head_mask=None):
outputs = self.model(input_ids = input_ids, attention_mask=attention_mask)
sequence_output = torch.cat((outputs[1][-1], outputs[1][-2], outputs[1][-3], outputs[1][-4]),-1)
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output) # bsz, seq_len, num_labels
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
loss_fct = nn.CrossEntropyLoss(ignore_index=0)
if attention_mask is not None:
active_loss = attention_mask.view(-1) == 1
active_logits = logits.view(-1, self.num_labels)[active_loss]
active_labels = labels.view(-1)[active_loss]
loss = loss_fct(active_logits, active_labels)
else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs #scores, (hidden_states), (attentions)
What steps do I have to take to integrate this model to AllenNLP Interpret?
Could you please help me with this problem?

Train GPT2 with Trainer & TrainingArguments using/specifying attention_mask

I'm using Trainer & TrainingArguments to train GPT2 Model, but it seems that this does not work well.
My datasets have the ids of the tokens of my corpus and the mask of each text, to indicate where to apply the attention:
Dataset({
features: ['attention_mask', 'input_ids', 'labels'],
num_rows: 2012860
}))
I am doing the training with Trainer & TrainingArguments, passing my model and my previous dataset as follows. But nowhere do I specify anything about the attention_mask:
training_args = TrainingArguments(
output_dir=path_save_checkpoints,
overwrite_output_dir=True,
num_train_epochs=1,
per_device_train_batch_size = 4,
gradient_accumulation_steps = 4,
logging_steps = 5_000, save_steps=5_000,
fp16=True,
deepspeed="ds_config.json",
remove_unused_columns = True,
debug = True
)
trainer = Trainer(
model=model,
args=training_args,
data_collator=data_collator,
train_dataset=dataset,
tokenizer=tokenizer,
)
trainer.train()
How should I tell the Trainer to use this feature (attention_mask)?
If you take a look at the file /transformers/trainer.py there is no reference to "attention" or "mask".
Thanks in advance!
Somewhere in the source code, you will see that inputs are passed to the model something like this
outputs = model(**inputs)
As long as your collator returns a dictionary that includes the attention_mask key, your attention mask will be passed to your GPT2 model.

Resources