Fine-tune a pre-trained model - huggingface-transformers

I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code:
enter image description here
and I got the following error:
enter image description here
I will be thankful if anyone could help.
The preprocessing steps I followed:
input_ids_t = []
attention_masks_t = []
for sent in df_train['text_a']:
encoded_dict = tokenizer.encode_plus(
sent,
add_special_tokens = True,
max_length = 128,
pad_to_max_length = True,
return_attention_mask = True,
return_tensors = 'tf',
)
input_ids_t.append(encoded_dict['input_ids'])
attention_masks_t.append(encoded_dict['attention_mask'])
# Convert the lists into tensors.
input_ids_t = tf.concat(input_ids_t, axis=0)
attention_masks_t = tf.concat(attention_masks_t, axis=0)
labels_t = np.asarray(df_train['label'])
and i did the same for testing data. Then:
train_data = tf.data.Dataset.from_tensor_slices((input_ids_t,attention_masks_t,labels_t))
and the same for testing data

It sounds like you are feeding the transformer_model 1 input instead of 3. Try removing the square brackets around transformer_model([input_ids, input_mask, segment_ids])[0] so that it reads transformer_model(input_ids, input_mask, segment_ids)[0]. That way, the function will have 3 arguments and not just 1.

Related

Training loss Validation loss all 0

Im trying to finetune a T5 model with my own dataset for grammatical error correction, but when i run the model i keep on getting all 0's for my results. Im following the huggingface translation tutorial.
enter image description here
I think its a problem with the preprocess function, but i can't seem to figure out why
prefix = ''
max_input_length = 128
max_target_length = 128
source_lang = "ar"
target_lang = "ar"
def preprocess_function(examples):
inputs = [prefix + ex for ex in examples["original"]]
targets = [ex for ex in examples["corrected"]]
model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)
# Setup the tokenizer for targets
with tokenizer.as_target_tokenizer():
labels = tokenizer(targets, max_length=max_target_length, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

[XAI for transformer custom model using AllenNLP]

I have been solving the NER problem for a Vietnamese dataset with 15 tags in IO format. I have been using the AllenNLP Interpret Toolkit for my model, but I can not configure it completely.
I have used a pre-trained language model "xlm-roberta-base" based-on HuggingFace. I have concatenated 4 last bert layers, and pass through to linear layer. The model architecture you can see in the source below.
class BaseBertSoftmax(nn.Module):
def __init__(self, model, drop_out , num_labels):
super(BaseBertSoftmax, self).__init__()
self.num_labels = num_labels
self.model = model
self.dropout = nn.Dropout(drop_out)
self.classifier = nn.Linear(4*768, num_labels) # 4 last of layer
def forward_custom(self, input_ids, attention_mask=None,
labels=None, head_mask=None):
outputs = self.model(input_ids = input_ids, attention_mask=attention_mask)
sequence_output = torch.cat((outputs[1][-1], outputs[1][-2], outputs[1][-3], outputs[1][-4]),-1)
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output) # bsz, seq_len, num_labels
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
loss_fct = nn.CrossEntropyLoss(ignore_index=0)
if attention_mask is not None:
active_loss = attention_mask.view(-1) == 1
active_logits = logits.view(-1, self.num_labels)[active_loss]
active_labels = labels.view(-1)[active_loss]
loss = loss_fct(active_logits, active_labels)
else:
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs #scores, (hidden_states), (attentions)
What steps do I have to take to integrate this model to AllenNLP Interpret?
Could you please help me with this problem?

Keras Graph disconnected cannot obtain value for tensor KerasTensor

Tensorflow: 2.4.0
This is the Full Error Message:
ValueError: Graph disconnected: cannot obtain value for tensor KerasTensor(type_spec=TensorSpec(shape=(None, 64, 64, 3), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'") at layer "flatten". The following previous layers were accessed without issue: []
I have been trying to make a controllable Autoencoder where I have 10 features I can variy to get an image (64x64 RGB)
And i have been having Trouble getting it working. I want to seperate the Neural Network into a full model which i can fit and an Decoder which i can use to later after training parse values into to generate images
btw i know this is not the perfect way to do an autoencoder it's just the simplest i can think of.
def Create_Generator(Image_Shape):
Input_Layer = Input(shape=Image_Shape)
Flatten_Layer1 = Flatten()(Input_Layer)
Dense_Layer1 = Dense(12288,activation="relu")(Flatten_Layer1)
Dense_Layer2 = Dense(6144,activation="relu")(Dense_Layer1)
Dense_Layer3 = Dense(1024, activation="relu")(Dense_Layer2)
Dense_Layer4 = Dense(10,activation="relu")(Dense_Layer3)
Dense_Layer5 = Dense(1024, activation="relu")(Dense_Layer4)
Dense_Layer6 = Dense(6144,activation="relu")(Dense_Layer5)
Dense_Layer7 = Dense(12288,activation="relu")(Dense_Layer6)
Reshape_Layer = Reshape(Image_Shape)(Dense_Layer7)
AutoEncoder = Model(Input_Layer,Reshape_Layer)
AutoEncoder.compile(optimizer='adam', loss ='binary_crossentropy')
encoded_input = Input(shape=(10,))
Decoder = Model([encoded_input,Dense_Layer5,Dense_Layer6,Dense_Layer7],Reshape_Layer)
return AutoEncoder,Decoder
data = np.load("data.npz")
X_train = data['X']
AutoEncoder,Decoder = Create_Generator((64,64,3))
#Just for testing if it works
print(AutoEncoder.predict([X_train[0]]))
print(Decoder([[1,1,1,1,1,1,1,1,1,1]]))
I think you have an error here:
Decoder = Model([encoded_input,Dense_Layer5,Dense_Layer6,Dense_Layer7],Reshape_Layer)
Dense_Layer5, Dense_Layer6, Dense_Layer7 are not tf.keras.layers.Input. You can not create Decoder this way.

How to adjust transparency (alpha) in seaborn swarmplot?

I have a swarmplot:
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True)
and I would like to adjust the transparency of the dots:
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True,
scatter_kws = {'alpha': 0.1})
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True,
plot_kws={'scatter_kws': {'alpha': 0.1}})
but neither of the above methods works.
any help is appreciated.
You can simply input the alpha argument directly in the swarmplot function:
import seaborn as sns
df = sns.load_dataset('diamonds').sample(1000)
sns.swarmplot(data=df, x='cut', y='carat', hue='color', alpha=0.5)
The documentation for swarmplot states
kwargs : key, value mappings
Other keyword arguments are passed through to matplotlib.axes.Axes.scatter().
Thus, you don't need to use scatter_kws={...}.
Compare this to, e.g., sns.lmplot, which states
{scatter,line}_kws : dictionaries
Additional keyword arguments to pass to plt.scatter and plt.plot.

Resources