Im trying to finetune a T5 model with my own dataset for grammatical error correction, but when i run the model i keep on getting all 0's for my results. Im following the huggingface translation tutorial.
I think its a problem with the preprocess function, but i can't seem to figure out why
prefix = ''
max_input_length = 128
max_target_length = 128
source_lang = "ar"
target_lang = "ar"
def preprocess_function(examples):
inputs = [prefix + ex for ex in examples["original"]]
targets = [ex for ex in examples["corrected"]]
model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)
# Setup the tokenizer for targets
with tokenizer.as_target_tokenizer():
labels = tokenizer(targets, max_length=max_target_length, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs


Is "insample" in mlr3tuning resampling can be used when we want to do hyperparameter tuning with the full dataset?

I've been trying to do some tuning hyperparameters for the survival SVM model. I used the AutoTuner function from the mlr3tuning package. I want to do tuning for the whole dataset (No train & test split). I've found the resampling class which is "insample". When I look at the mlr3 dictionary, it said "Uses all observations as training and as test set."
My questions is, Is "insample" in mlr3tuning resampling can be used when we want to do hyperparameter tuning with the full dataset and if it applies, why when I tried to use the hyperparameter to the survivalsvm function from the survivalsvm package, it gives the different output of concordance index?
This is the code I used for hyperparameter tuning
task = as_task_surv(x = veteran, time = 'time', event = 'status')
learner = lrn("surv.svm", type = "hybrid", diff.meth = "makediff3", = c(0.1, 0.1),kernel = 'rbf_kernel')
search_space = ps(gamma = p_dbl(2^-5, 2^5),mu = p_dbl(2^-5, 2^5))
search_space$trafo = function(x, param_set) {
x$ = c(x$gamma, x$mu)
x$gamma = x$mu = NULL
ssvm_at = AutoTuner$new(
learner = learner,
resampling = rsmp("insample"),
search_space = search_space,
measure = msr('surv.cindex'),
terminator = trm('evals', n_evals = 5),
tuner = tnr('grid_search'))
And this is the code that I've been trying using the survivalsvm function from the survivalsvm package
survsvm.reg <- survivalsvm(Surv(veteran$time , veteran$status ) ~ .,
data = veteran,
type = "hybrid", = c(32,32),diff.meth = "makediff3",
opt.meth = "quadprog", kernel = "rbf_kernel")
pred.survsvm.reg <- predict(survsvm.reg,veteran)
conindex(pred.survsvm.reg, veteran$time)

Fine-tune a pre-trained model

I am new to transformer based models. I am trying to fine-tune the following model ( on my dataset. The code:
and I got the following error:
I will be thankful if anyone could help.
The preprocessing steps I followed:
input_ids_t = []
attention_masks_t = []
for sent in df_train['text_a']:
encoded_dict = tokenizer.encode_plus(
add_special_tokens = True,
max_length = 128,
pad_to_max_length = True,
return_attention_mask = True,
return_tensors = 'tf',
# Convert the lists into tensors.
input_ids_t = tf.concat(input_ids_t, axis=0)
attention_masks_t = tf.concat(attention_masks_t, axis=0)
labels_t = np.asarray(df_train['label'])
and i did the same for testing data. Then:
train_data =,attention_masks_t,labels_t))
and the same for testing data
It sounds like you are feeding the transformer_model 1 input instead of 3. Try removing the square brackets around transformer_model([input_ids, input_mask, segment_ids])[0] so that it reads transformer_model(input_ids, input_mask, segment_ids)[0]. That way, the function will have 3 arguments and not just 1.

Fine-tune custom pre-trained language model

I am new to fine-tuning transformer models. I am trying to fine-tune this model on my dataset but i got an error. The code:
distil_bert = 'Chramer/remote-sensing-distilbert-cased'
config = transformers.BertConfig(dropout=0.2, attention_dropout=0.2)
transformer_model=transformers.TFBertModel.from_pretrained(distil_bert,config=config, from_pt=True)
input_ids = tf.keras.layers.Input(shape=(128,), name='input_ids', dtype='int32')
attention_mask = tf.keras.layers.Input(shape=(128,), name='attention_mask',
input_segments = tf.keras.layers.Input(shape=(128,), name='input_segments',
embedding_layer = transformer_model(input_ids, attention_mask,input_segments)[0]
X = tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(50, return_sequences=True,
dropout=0.1, recurrent_dropout=0.1))(embedding_layer)
X = tf.keras.layers.GlobalMaxPool1D()(X)
X = tf.keras.layers.Dense(50, activation='relu')(X)
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.Dense(1, activation='sigmoid')(X)
model = tf.keras.Model(inputs={'input_ids': input_ids,'attention_mask':
attention_mask,'input_segments': input_segments}, outputs=X)
I got this error:
I will be thankful if anyone could help. I am using Tensorflow.

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model_dir = './models/'
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets =, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
translation = model.generate(**inputs)

[XAI for transformer custom model using AllenNLP]

I have been solving the NER problem for a Vietnamese dataset with 15 tags in IO format. I have been using the AllenNLP Interpret Toolkit for my model, but I can not configure it completely.
I have used a pre-trained language model "xlm-roberta-base" based-on HuggingFace. I have concatenated 4 last bert layers, and pass through to linear layer. The model architecture you can see in the source below.
class BaseBertSoftmax(nn.Module):
def __init__(self, model, drop_out , num_labels):
super(BaseBertSoftmax, self).__init__()
self.num_labels = num_labels
self.model = model
self.dropout = nn.Dropout(drop_out)
self.classifier = nn.Linear(4*768, num_labels) # 4 last of layer
def forward_custom(self, input_ids, attention_mask=None,
labels=None, head_mask=None):
outputs = self.model(input_ids = input_ids, attention_mask=attention_mask)
sequence_output =[1][-1], outputs[1][-2], outputs[1][-3], outputs[1][-4]),-1)
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output) # bsz, seq_len, num_labels
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
loss_fct = nn.CrossEntropyLoss(ignore_index=0)
if attention_mask is not None:
active_loss = attention_mask.view(-1) == 1
active_logits = logits.view(-1, self.num_labels)[active_loss]
active_labels = labels.view(-1)[active_loss]
loss = loss_fct(active_logits, active_labels)
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs #scores, (hidden_states), (attentions)
What steps do I have to take to integrate this model to AllenNLP Interpret?
Could you please help me with this problem?
