Huggingface transformers) training loss sometimes decreases really slowly (using Trainer) - sentiment-analysis
I'm fine-tuning sentiment analysis model using news data. As the simplest way is using Huggingface pre-trained model (roberta-base), I followed Huggingface tutorial - https://huggingface.co/blog/sentiment-analysis-python - this one.
The custom input data is simple : There're 2 columns named 'text' and 'labels'. The column 'text' is consisted with news sentence and 'label' is consisted with '0' (40%) and '1' (60%). Then it was separated into train, eval, test set.
So this is the problem what I met : 'eval_loss' never changes during training but its accuracy passed 50%. And training loss is decreasing while training. So It seems learned something. Maybe it didn't learn after first epoch or selected best checkpoint automatically - but I'm confusing what is actually happened.
And this is the training code (without labeling code):
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification
import numpy as np
from datasets import load_metric
from transformers import set_seed
set_seed(42)
dataset = load_dataset('json',data_files={'train':'./data/labeled_news/labeled_news_heads_train.json',
'eval':'./data/labeled_news/labeled_news_heads_eval.json'}, field='data')
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
train_dataset = tokenized_datasets["train"].shuffle(seed=42)
eval_dataset = tokenized_datasets["eval"].shuffle(seed=42)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
model = AutoModelForSequenceClassification.from_pretrained("roberta-base", num_labels=2)
def compute_metrics(eval_pred):
load_accuracy = load_metric("accuracy")
load_f1 = load_metric("f1")
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
accuracy = load_accuracy.compute(predictions=predictions, references=labels)["accuracy"]
f1 = load_f1.compute(predictions=predictions, references=labels)["f1"]
return {"accuracy": accuracy, "f1": f1}
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
repo_name = "Direct_v1"
training_args = TrainingArguments(
output_dir=repo_name,
learning_rate=2e-5,
per_device_train_batch_size=24,
per_device_eval_batch_size=1,
num_train_epochs=5,
weight_decay=0.01,
save_strategy="steps",
evaluation_strategy ='steps',
eval_steps = 250,
save_steps=250,
push_to_hub=False,
save_total_limit = 5,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()
And this is the result printed on console:
Using custom data configuration default-e08b7987c7aa36c3
Reusing dataset json (/home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b)
100%|██████████| 2/2 [00:00<00:00, 315.56it/s]
Loading cached processed dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-050035fb0e59db40.arrow
Loading cached processed dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-2981b391c69b5e0c.arrow
Loading cached shuffled indices for dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-26ea42ee0127a8d9.arrow
Loading cached shuffled indices for dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-ef064a1251721c99.arrow
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
/home/nvme20142249/PycharmProjects/StockPrediction/venv/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
***** Running training *****
Num examples = 10147
Num Epochs = 5
Instantaneous batch size per device = 24
Total train batch size (w. parallel, distributed & accumulation) = 24
Gradient Accumulation steps = 1
Total optimization steps = 2115
12%|█▏ | 250/2115 [02:04<15:33, 2.00it/s]The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
100%|██████████| 634/634 [00:14<00:00, 53.32it/s]
Saving model checkpoint to Direct_v1/checkpoint-250
Configuration saved in Direct_v1/checkpoint-250/config.json
{'eval_loss': 0.6686041951179504, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.2853, 'eval_samples_per_second': 44.381, 'eval_steps_per_second': 44.381, 'epoch': 0.59}
Model weights saved in Direct_v1/checkpoint-250/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-250/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-250/special_tokens_map.json
24%|██▎ | 500/2115 [04:28<14:23, 1.87it/s]The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6803, 'learning_rate': 1.5271867612293146e-05, 'epoch': 1.18}
24%|██▎ | 500/2115 [04:43<14:23, 1.87it/s]
100%|██████████| 634/634 [00:15<00:00, 49.78it/s]
Saving model checkpoint to Direct_v1/checkpoint-500
Configuration saved in Direct_v1/checkpoint-500/config.json
{'eval_loss': 0.6686403751373291, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 15.0809, 'eval_samples_per_second': 42.04, 'eval_steps_per_second': 42.04, 'epoch': 1.18}
Model weights saved in Direct_v1/checkpoint-500/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-500/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-500/special_tokens_map.json
35%|███▌ | 750/2115 [06:56<11:30, 1.98it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
35%|███▌ | 750/2115 [07:10<11:30, 1.98it/s]
100%|██████████| 634/634 [00:14<00:00, 51.95it/s]
Saving model checkpoint to Direct_v1/checkpoint-750
Configuration saved in Direct_v1/checkpoint-750/config.json
{'eval_loss': 0.6685948967933655, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.3642, 'eval_samples_per_second': 44.138, 'eval_steps_per_second': 44.138, 'epoch': 1.77}
Model weights saved in Direct_v1/checkpoint-750/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-750/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-750/special_tokens_map.json
47%|████▋ | 1000/2115 [09:18<09:18, 2.00it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6786, 'learning_rate': 1.054373522458629e-05, 'epoch': 2.36}
47%|████▋ | 1000/2115 [09:32<09:18, 2.00it/s]
100%|██████████| 634/634 [00:14<00:00, 52.47it/s]
Saving model checkpoint to Direct_v1/checkpoint-1000
Configuration saved in Direct_v1/checkpoint-1000/config.json
{'eval_loss': 0.6686900854110718, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.7566, 'eval_samples_per_second': 42.964, 'eval_steps_per_second': 42.964, 'epoch': 2.36}
Model weights saved in Direct_v1/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1000/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1000/special_tokens_map.json
59%|█████▉ | 1250/2115 [11:40<07:14, 1.99it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
59%|█████▉ | 1250/2115 [11:54<07:14, 1.99it/s]
100%|██████████| 634/634 [00:14<00:00, 52.63it/s]
Saving model checkpoint to Direct_v1/checkpoint-1250
Configuration saved in Direct_v1/checkpoint-1250/config.json
{'eval_loss': 0.6696870923042297, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.2725, 'eval_samples_per_second': 44.421, 'eval_steps_per_second': 44.421, 'epoch': 2.96}
Model weights saved in Direct_v1/checkpoint-1250/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1250/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1250/special_tokens_map.json
71%|███████ | 1500/2115 [14:01<05:09, 1.99it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6798, 'learning_rate': 5.815602836879432e-06, 'epoch': 3.55}
71%|███████ | 1500/2115 [14:16<05:09, 1.99it/s]
100%|██████████| 634/634 [00:14<00:00, 52.17it/s]
Saving model checkpoint to Direct_v1/checkpoint-1500
Configuration saved in Direct_v1/checkpoint-1500/config.json
{'eval_loss': 0.6706184148788452, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.5084, 'eval_samples_per_second': 43.699, 'eval_steps_per_second': 43.699, 'epoch': 3.55}
Model weights saved in Direct_v1/checkpoint-1500/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1500/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1500/special_tokens_map.json
Deleting older checkpoint [Direct_v1/checkpoint-250] due to args.save_total_limit
83%|████████▎ | 1750/2115 [16:25<03:03, 1.99it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
83%|████████▎ | 1750/2115 [16:39<03:03, 1.99it/s]
100%|██████████| 634/634 [00:14<00:00, 50.95it/s]
Saving model checkpoint to Direct_v1/checkpoint-1750
Configuration saved in Direct_v1/checkpoint-1750/config.json
{'eval_loss': 0.6691468954086304, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.515, 'eval_samples_per_second': 43.679, 'eval_steps_per_second': 43.679, 'epoch': 4.14}
Model weights saved in Direct_v1/checkpoint-1750/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1750/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1750/special_tokens_map.json
Deleting older checkpoint [Direct_v1/checkpoint-500] due to args.save_total_limit
95%|█████████▍| 2000/2115 [18:48<00:58, 1.95it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6784, 'learning_rate': 1.087470449172577e-06, 'epoch': 4.73}
95%|█████████▍| 2000/2115 [19:04<00:58, 1.95it/s]
100%|██████████| 634/634 [00:15<00:00, 50.16it/s]
Saving model checkpoint to Direct_v1/checkpoint-2000
Configuration saved in Direct_v1/checkpoint-2000/config.json
{'eval_loss': 0.6719586253166199, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 15.2941, 'eval_samples_per_second': 41.454, 'eval_steps_per_second': 41.454, 'epoch': 4.73}
Model weights saved in Direct_v1/checkpoint-2000/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-2000/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-2000/special_tokens_map.json
Deleting older checkpoint [Direct_v1/checkpoint-750] due to args.save_total_limit
100%|██████████| 2115/2115 [20:05<00:00, 2.05it/s]
Training completed. Do not forget to share your model on huggingface.co/models =)
100%|██████████| 2115/2115 [20:05<00:00, 1.75it/s]
{'train_runtime': 1205.4397, 'train_samples_per_second': 42.088, 'train_steps_per_second': 1.755, 'train_loss': 0.6791386345035922, 'epoch': 5.0}
I think this is quite weird because it seems learned something but eval_loss doesn't change while training. Does 'transformers.Trainer' select best checkpoint automatically? I'm confusing this is an error or not.
** edited on 4/25 : I changed compute_metrics function by
load_accuracy = load_metric("accuracy")
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return load_accuracy.compute(predictions=predictions, references=labels)
and training error decreased normally while training. I thought the problem was solved but, sometimes It doesn't. Training error didn't decrease with same datasets. (different checkpoints) Why did this happen?
Related
Getting The Actual Video Label After Model.Predict Operations With 3DCNN Sequential Model
I have a challenge and I am trying to solve this in order to move forward, it is the final piece of the puzzle for my model operations. What I am trying to do?:* is verify the videos that are being used in the Xval_test variable via the split operations here as per the example via here In Python sklearn, how do I retrieve the names of samples/variables in test/training data? : X_train, Xval_test, Y_train, Yval_test = train_test_split( X, Y, train_size=0.8, test_size=0.2, random_state=1, shuffle=True) 1. What I tried?: is calling the name from the actual tag via file_path name, however that is not working. (every time the code runs the names from the file path are taken and not from the actual split operations Xval_test variable. This causes an issue during the model.fit() procedures as it changes the 1D flattened tensor to (a number of rows, 1 column) file_paths = [] for file_name in os.listdir(root): file_path = os.path.join(root, file_name) if os.path.isfile(file_path): file_paths.append(file_path) print('**********************************************************') print('ALL Directory File Paths Completed', file_paths) I am not sure if the files are being shuffled properly with my weak attempt as per the guidelines from the split() forum. (based on my knowledge, every time I run the code, those files would be shuffled to a new Xval_test set relative to it's specified split parameter 80:20. 2. I tried calling the model.predict(), that presents no labels for which I was hoping that it did (maybe I am using it the wrong way for calling the indices, I don't know). my_pred = model.predict(Xval_test).argmax(axis=1) I tried calling np.argsmax():( I KNOW THE TOTAL AMT OF FILES IN Xval_test is 16 based on the split()) Y_valpred = np.argmax(model.predict(Xval_test), axis=1) # model This returns just the class label and not it's contents e.g. the classes in the datastore are folders containing (walking and fencing) rather than the actual video labels such as (walking0.avi....100/n and fencing0.avi.....100n/) !!!???! I am not certain of the operation to get the folder content's tags, the actual file itself. It is this that I am trying to get from the X_test variable. (or maybe its the wrong variable or functioning I using, again I am lacking the knowledge to understand this, please assist so that I can move to the next stage). 3. I tried printing all of the variable's from the previous operations to see where that name tag would be stored and it is stored in the name variable below as per my operations: (but how do I call these folder content's file tags forward to the X_test variable or as per my choice the model.predict() outputs in a column together with the other metrics. So far, this causes issues with the model.fit() function???) for files3 in files2: name = os.path.join(namelist, files3) name1 = name.strip("./dataset/") name2 = name1.strip("Fencing/") name3 = name2.strip("Stabing/") name3 = name3.replace('.av', '') name4 = name3.split() # print("This is name1 ", name1) # name5 = pd.DataFrame({"vid_names": name4}).to_csv("results.csv") # name1 = name1.replace('[]', '') with open('vid_names.csv', 'a',newline='') as f: writer = csv.writer(f) writer = writer.writerow(name4) # print("My Video Names => ", name3) 3A. Thank you in advance, I am grateful for any guidance provided, Please assist! QUESTIONS: ############################################ Ques: 1. Is it possible to see what video label tags are segmented within the X_Test Variable? Ques: 1A. If yes, may I request your guidance here, please, on how this can be done?: I have been researching for weeks and cannot seem to get this sorted, your efforts would be greatly appreciated. Ques: 2. MY Expected OUTCOME: I am trying to access the prediction. So, In the end I would get an output relative to the actual video tag that insinuates the actual video that was used in the prediction operation along with its class tag (see below): Initially, the model.predict() operations outputs numerical data relative to the class label. I am trying to access the actual file label as well: For example, what I want the predictions to look like is as follows: X_test_labs Pred_labs Actual_File Pred_Score 0 Fencing Fencing fencing0.avi 0.99650866 1 Walking Fencing walking6.avi 0.9948837 2 Walking Walking walking21.avi 0.9967557 3 Fencing Fencing fencing32.avi 0.9930409 4 Walking Fencing walking43.avi 0.9961387 5 Walking Walking walking48.avi 0.6467387 6 Walking Walking walking50.avi 0.5465369 7 Walking Walking walking9.avi 0.3478027 8 Fencing Fencing fencing22.avi 0.1247543 9 Fencing Fencing fencing46.avi 0.7477777 10 Walking Walking walking37.avi 0.8499399 11 Fencing Fencing fencing19.avi 0.8887722 12 Walking Walking walking12.avi 0.7775351 13 Fencing Fencing fencing33.avi 0.4323323 14 Fencing Fencing fencing51.avi 0.7812434 15 Fencing Fencing fencing8.avi 0.8723476 I am not sure how to achieve this task, this one is a little more tricky for me than anticipated This is my code* '''*******Load Dependencies********''' from keras.regularizers import l2 from keras.layers import Dense from keras_tqdm import TQDMNotebookCallback from tqdm.keras import TqdmCallback from tensorflow import keras from tensorflow.keras.preprocessing.image import ImageDataGenerator import math import tensorflow as tf from tqdm import tqdm import videoto3d import seaborn as sns import scikitplot as skplt from sklearn import preprocessing from sklearn.metrics import classification_report, confusion_matrix from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score from keras.utils.vis_utils import plot_model from keras.utils import np_utils from tensorflow.keras.optimizers import Adam from keras.models import Sequential from keras.losses import categorical_crossentropy from keras.layers import (Activation, Conv3D, Dense, Dropout, Flatten,MaxPooling3D) import pandas as pd import numpy as np import matplotlib.pyplot as plt import os import argparse import time import sys import openpyxl import os import re import csv from keras import models import cv2 import pickle import glob from numpy import load np.seterr(divide='ignore', invalid='ignore') print('**********************************************************') print('Graphical Representation Of Accuracy & Validation Results Completed') def plot_history(history, result_dir): plt.plot(history.history['val_accuracy'], marker='.') plt.plot(history.history['accuracy'], marker='.') plt.title('model accuracy') plt.xlabel('epoch') plt.ylabel('accuracy') plt.grid() plt.legend(['Val_acc', 'Test_acc'], loc='lower right') plt.savefig(os.path.join(result_dir, 'model_accuracy.png')) plt.close() plt.plot(history.history['val_loss'], marker='.') plt.plot(history.history['loss'], marker='.') plt.title('model Loss') plt.xlabel('epoch') plt.ylabel('loss') plt.grid() plt.legend(['Val_loss', 'Test_loss'], loc='upper right') plt.savefig(os.path.join(result_dir, 'model_loss.png')) plt.close() # Saving History Accuracy & Validation Acuuracy Results To Directory print('**********************************************************') print('Generating History Acuuracy Results Completed') def save_history(history, result_dir): loss = history.history['loss'] acc = history.history['accuracy'] val_loss = history.history['val_loss'] val_acc = history.history['val_accuracy'] nb_epoch = len(acc) # Creating The Results File To Directory = Store Results print('**********************************************************') print('Saving History Acuuracy Results To Directory Completed') with open(os.path.join(result_dir, 'result.txt'), 'w') as fp: fp.write('epoch\tloss\tacc\tval_loss\tval_acc\n') # print(fp) for i in range(nb_epoch): fp.write('{}\t{}\t{}\t{}\t{}\n'.format( i, loss[i], acc[i], val_loss[i], val_acc[i])) print('**********************************************************') print('Loading All Specified Video Data Samples From Directory Completed') def loaddata(video_dir, vid3d, nclass, result_dir, color=False, skip=True): files = os.listdir(video_dir) with open('files.csv', 'w') as f: writer = csv.writer(f) writer.writerow(files) root = '/Users/symbadian/3DCNN_latest_Version/3DCNNtesting/dataset/' dirlist = [item for item in os.listdir( root) if os.path.isdir(os.path.join(root, item))] print('Get the filesname and path') print('DIRLIST Directory Completed', dirlist) file_paths = [] for file_name in os.listdir(root): file_path = os.path.join(root, file_name) if os.path.isfile(file_path): file_paths.append(file_path) print('**********************************************************') print('ALL Directory File Paths Completed', file_paths) roots, dirsy, fitte = next(os.walk(root), ([], [], [])) print('**********************************************************') print('ALL Directory ROOTED', roots, fitte, dirsy) X = [] print('X labels==>', X) # This stores all variable data in an object format labellist = [] pbar = tqdm(total=len(files)) # generate progress bar for file processing print('**********************************************************') print('Generating/Join Class Labels For Video Dataset For Input Completed') # Accessing files and labels from dataset directory for filename in files: pbar.update(1) if filename == '.DS_Store':#.DS_Store continue namelist = os.path.join(video_dir, filename) files2 = os.listdir(namelist) ############################################################################### ######### NEEDS TO FIX THIS Data Adding to CSV Rather Than REWRITTING ######### for files3 in files2: name = os.path.join(namelist, files3) #Call a function that extract the frames details of all file names label = vid3d.get_UCF_classname(filename) if label not in labellist: if len(labellist) >= nclass: continue labellist.append(label) # This X variable is the point where the lables are store (I think??!?!) X.append(vid3d.video3d(name, color=color, skip=skip)) pbar.close() # generating labellist/ writing to directory print('******************************************************') print('Saving All Class Labels For Referencing To Directory Completed') with open(os.path.join(result_dir, 'classes.txt'), 'w') as fp: for i in range(len(labellist)): # print('These are labellist i classes',i) #Not This fp.write('{}\n'.format(labellist[i])) # print('These are my labels: ==>',mylabel) for num, label in enumerate(labellist): for i in range(len(labels)): if label == labels[i]: labels[i] = num # print('This is labels i',labels[i]) #Not this if color: # conforming image channels of image for input sequence return np.array(X).transpose((0, 2, 3, 4, 1)), labels else: return np.array(X).transpose((0, 2, 3, 1)), labels print('**********************************************************') print('Generating Args Informative Messages/ Tuning Parameters Options Completed') def main(): parser = argparse.ArgumentParser(description='A 3D Convolution Model For Action Recognition') parser.add_argument('--batch', type=int, default=130) parser.add_argument('--epoch', type=int, default=100) parser.add_argument('--videos', type=str, default='dataset',help='Directory Where Videos Are Stored')# UCF101 parser.add_argument('--nclass', type=int, default= 2) parser.add_argument('--output', type=str, required=True) parser.add_argument('--color', type=bool, default=False) parser.add_argument('--skip', type=bool, default=True) parser.add_argument('--depth', type=int, default=10) args = parser.parse_args() # print('This is the Option Arguments ==>',args) print('**********************************************************') print('Specifying Input Size and Channels Completed') img_rows, img_cols, frames = 32, 32, args.depth channel = 3 if args.color else 1 print('**********************************************************') print('Saving Dataset As NPZ To Directory Completed') fname_npz = 'dataset_{}_{}_{}.npz'.format(args.nclass, args.depth, args.skip) vid3d = videoto3d.Videoto3D(img_rows, img_cols, frames) nb_classes = args.nclass # loading the data if os.path.exists(fname_npz): loadeddata = np.load(fname_npz) X, Y = loadeddata["X"], loadeddata["Y"] else: x, y = loaddata(args.videos, vid3d, args.nclass,args.output, args.color, args.skip) X = x.reshape((x.shape[0], img_rows, img_cols, frames, channel)) Y = np_utils.to_categorical(y, nb_classes) X = X.astype('float32') #save npzdata to file np.savez(fname_npz, X=X, Y=Y) print('Saved Dataset To dataset.npz. Completed') print('X_shape:{}\nY_shape:{}'.format(X.shape, Y.shape)) print('**********************************************************') print('Initialise Model Layers & Layer Parameters Completed') # Sequential groups a linear stack of layers into a tf.keras.Model. # Sequential provides training and inference features on this model model = Sequential() model.add(Conv3D(32, kernel_size=(3, 3, 3),input_shape=(X.shape[1:]), padding='same')) model.add(Activation('relu')) model.add(Conv3D(32, kernel_size=(3, 3, 3), padding='same')) model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same')) model.add(Conv3D(64, kernel_size=(3, 3, 3), padding='same')) model.add(Activation('relu')) model.add(Conv3D(64, kernel_size=(3, 3, 3), padding='same')) model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same')) model.add(Conv3D(128, kernel_size=(3, 3, 3), padding='same')) model.add(Activation('relu')) model.add(Conv3D(128, kernel_size=(3, 3, 3), padding='same')) model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same')) model.add(Dropout(0.5)) model.add(Conv3D(256, kernel_size=(3, 3, 3), padding='same')) model.add(Activation('relu')) model.add(Conv3D(256, kernel_size=(3, 3, 3), padding='same')) model.add(MaxPooling3D(pool_size=(3, 3, 3), padding='same')) model.add(Dropout(0.5)) model.add(Flatten()) # Dense function to convert FCL to 512 values model.add(Dense(512, activation='sigmoid')) model.add(Dropout(0.5)) model.add(Dense(nb_classes, activation='softmax')) model.compile(loss=categorical_crossentropy,optimizer=Adam(), metrics=['accuracy']) model.summary() print('this is the model shape') model.output_shape plot_model(model, show_shapes=True,to_file=os.path.join(args.output, 'model.png')) print('**********************************************************') print("Train Test Method HoldOut Performance") X_train, Xval_test, Y_train, Yval_test = train_test_split( X, Y, train_size=0.8, test_size=0.2, random_state=1, stratify=Y, shuffle=True) print('**********************************************************') print('Deploying Data Fitting/ Performance Accuracy Guidance Completed') #Stop operations when experiencing no learning rlronp = tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=1, mode='auto', min_delta=0.0001, cooldown=1, min_lr=0.0001) # Fit the training data history = model.fit(X_train, Y_train, validation_split=0.20, batch_size=args.batch,epochs=args.epoch, verbose=1, callbacks=[rlronp], shuffle=True) # Predict X_Test (Xval_test) data and Labels predict_labels = model.predict(Xval_test, batch_size=args.batch,verbose=1,use_multiprocessing=True) classes = np.argmax(predict_labels, axis = 1) label = np.argmax(Yval_test,axis = 1) print('This the BATCH size', args.batch) print('This the DEPTH size', args.depth) print('This the EPOCH size', args.epoch) print('This the TRAIN SPLIT size', len(X_train)) print('This the TEST SPLIT size', len(Xval_test)) # https://stackoverflow.com/questions/52261597/keras-model-fit-verbose-formatting # A json file enhances the model performance by a simple to save/load model model_json = model.to_json() if not os.path.isdir(args.output): os.makedirs(args.output) with open(os.path.join(args.output, 'ucf101_3dcnnmodel.json'), 'w') as json_file: json_file.write(model_json) # hd5 contains multidimensional arrays of scientific data model.save_weights(os.path.join(args.output, 'ucf101_3dcnnmodel.hd5')) ''' Evaluation is a process ''' print('**********************************************************') print('Displying Test Loss & Test Accuracy Completed') loss, acc = model.evaluate(Xval_test, Yval_test, verbose=2, batch_size=args.batch, use_multiprocessing=True) # verbose 0 print('this is args output', args.output) plot_history(history, args.output) save_history(history, args.output) print('**********************************************************') # Generating Picture Of Confusion matrix print('**********************************************************') print('Generating CM InputData/Classification Report Completed') #Ground truth (correct) target values. y_valtest_arg = np.argmax(Yval_test, axis=1) #Estimated targets as returned by a classifier Y_valpred = np.argmax(model.predict(Xval_test), axis=1) # model print('y_valtest_arg Shape is ==>', y_valtest_arg.shape) print('Y_valpred Shape is ==>', Y_valpred.shape) print('**********************************************************') print('Classification_Report On Model Performance Completed==') print(classification_report(y_valtest_arg.round(), Y_valpred.round(), target_names=filehandle, zero_division=1)) '''Intitate Confusion Matrix''' # print('Model Confusion Matrix Per Test Data Completed===>') cm = confusion_matrix(y_valtest_arg, Y_valpred, normalize=None) print('Display Confusion Matrix ===>', cm) print('**********************************************************') print('Model Overall Accuracy') print('Model Test loss:', loss) print('**********************************************************') print('Model Test accuracy:', acc) print('**********************************************************') if __name__ == '__main__': main()
I Think the solution is around the prediction, train, test split and the evaluation arguments. However, I am lacking the knowledge to access the details required from the train, test, split(), if that's where the issue is. I am really thankful for your guidance in advance, thanks a whole lot for clarifying this for me and closing the gaps in my understanding. really appreciate this!!!
Change all images in training set
I have a convolutional neural network. And I wanted to train it on images from the training set but first they should be wrapped with my function change(tensor, float) that takes in a tensor/image of the form [hight,width,3] and a float. Batch size =4 loading data trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2) Cnn architecture for epoch in range(2): # loop over the dataset multiple times running_loss = 0.0 for i, data in enumerate(trainloader, 0): # get the inputs; data is a list of [inputs, labels] inputs, labels = data #size of inputs [4,3,32,32] #size of labels [4] inputs = change(inputs,0.1) <---------------------------- # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) #[4, 10] loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.item() if i % 2000 == 1999: # print every 2000 mini-batches print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}') running_loss = 0.0 print('Finished Training') I am trying to apply the image function change but it gives an object error. it there a quick way to fix it? I am using a Julia function but it works completely fine with other objects. Error message: JULIA: MethodError: no method matching copy(::PyObject) Closest candidates are: copy(!Matched::T) where T<:SHA.SHA3_CTX at /opt/julia-1.7.2/share/julia/stdlib/v1.7/SHA/src/types.jl:213 copy(!Matched::T) where T<:SHA.SHA2_CTX at /opt/julia-1.7.2/share/julia/stdlib/v1.7/SHA/src/types.jl:212 copy(!Matched::Number) at /opt/julia-1.7.2/share/julia/base/number.jl:113
I would recommend to put change function to transforms list, so you do data changes on transformation stage. partial from functools will help you to fix number of arguments, like this: from functools import partial def change(input, float): pass # Use partial to fix number of params, such that change accepts only input change_partial = partial(change, float=pass_float_value_here) # Add change_partial to a list of transforms before or after converting to tensors transforms = Compose([ RandomResizedCrop(img_size), # example # Add change_partial here if it operates on PIL Image change_partial, ToTensor(), # convert to tensor # Add change_partial here if it operates on torch tensors change_partial, ])
Saving bert model at every epoch for further training
I am using bert_model.save_pretrained for saving the model at end as this is the command that helps in saving the model with all configurations and weights but this cannot be used in model.fit command as in callbacks saving model at each epoch does not save with save_pretrained. Can anybody help me in saving bert model at each epoch since i cannot train whole bert model in one go? Edit Code for loading pre trained bert model bert_model = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_classes) Code for compiling the bert model from tensorflow.keras import optimizers bert_model.compile(loss='categorical_crossentropy', optimizer=optimizers.Adam(learning_rate=0.00005), metrics=['accuracy']) bert_model.summary() Code for training and saving the bert model checkpoint_filepath_1 = 'callbacks_models/BERT1.{epoch:02d}- {val_loss:.2f}.h5' checkpoint_filepath_2 = 'callbacks_models/complete_best_BERT_model_1.h5' callbacks_1 = ModelCheckpoint( filepath=checkpoint_filepath_1, monitor='val_loss', mode='min', save_best_only=False, save_weights_only=False, save_freq='epoch') callbacks_2 = ModelCheckpoint( filepath=checkpoint_filepath_2, monitor='val_loss', mode='min', save_best_only=True) es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5) hist = bert_model.fit([train1_input_ids, train1_attention_masks], y_train1, batch_size=16, epochs=1,validation_data= ([val_input_ids, val_attention_masks], y_val), callbacks [es,callbacks_1,callbacks_2,history_logger]) min_val_score = min(hist.history['val_loss']) print ("\nMinimum validation loss = ", min_val_score) bert_model.save_pretrained("callbacks_models/Complete_BERT_model_1.h5")
Number of features of the model must match the input. Model n_features is 51 and input n_features is 55 error with BERT tokenizer
I am working on classification model. I have a Description column in my data on which I am using Bert tokenization. def tokenization_and_encoding(data,model_name,independent_col,target_col): tokenizer = BertTokenizerFast.from_pretrained(model_name,do_lower_case=True) train_text=list(data[independent_col]) train_labels=list(data[target_col]) train_encodings = tokenizer(train_text, truncation=True, padding=True,max_length=256) train_encodings=train_encodings['input_ids'] return train_encodings,train_labels model_name='uncased_L-12_H-768_A-12/' data=data[['Description','Target']] #drop null values data = data[data['Outage Description'].notnull()] calibrated_svc = CalibratedClassifierCV(LinearSVC(), method='sigmoid') calibrated_svc.fit(train_encodings,train_labels) length_of_encoding = len(train_encodings[0])##length is 51 pickle.dump(calibrated_svc, open(r".\model\bert__"+str(length_of_encoding)+".pkl", 'wb'), protocol=4) ######################################################################### ##########################Prediction##################################### tokenizer = BertTokenizerFast.from_pretrained(model_name,do_lower_case=True) #get test text test_text=list(test_data[independent_col]) # #set encoding size test_encodings_fix=[0]*51 #encode text test_encodings = tokenizer(test_text, truncation=True, padding=True, max_length=256) test_encodings=test_encodings['input_ids'] #make encoding fix lenght for enc in test_encodings: test_encodings_fix_trim=test_encodings_fix[len(enc):51] enc.extend(test_encodings_fix_trim) #load model Pkl_Filename = r'\model_new\bert_model.pkl' with open(Pkl_Filename, 'rb') as file: Pickled_svc_Model = pickle.load(file) #predict predict_svc_test_pred_bbc = pd.DataFrame(Pickled_svc_Model.predict(test_encodings)) Running the prediction module throwing me error as : ValueError: Number of features of the model must match the input. Model n_features is 51 and input n_features is 55. When I checked the test_encoding there the value is 55. My training data has 105 records and test data has 5 records. I am not able to figure it out where I need to fix.
Anormal number of sims document in gensim
I'm actually injecting 77 document in a gensim mode by reading them from a database with a first script and i save the document on file system. I then load an other doc to check the similarity with a vector def read_corpus_bdd(cursor, tokens_only=False): for i, (url_id, url_label, contenu) in enumerate(cursor): tokens = gensim.utils.simple_preprocess(contenu) if tokens_only: yield tokens else: # For training data, add tags # yield gensim.models.doc2vec.TaggedDocument(tokens, dataLine[0]) yield gensim.models.doc2vec.TaggedDocument(tokens, [int(str(url_id))]) print (int(str(url_id))) targetContentCorpus = list(read_corpus_bdd(cursor)) # Param of trainer corpus model = gensim.models.doc2vec.Doc2Vec(vector_size=40, min_count=2, epochs=40) # Build a vocabulary model.build_vocab(targetContentCorpus) ############################################################################### model.train(targetContentCorpus, total_examples=model.corpus_count, epochs=model.epochs) ##generate file model name for save from datetime import date pathModelSave=os.getenv("MODEL_BASE_SAVE") +'/projet_'+ str(projetId) When i infer the vector : inferred_vector = model.infer_vector(test_corpus[0]) sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs)) len(sims) #output 335 So I don't understand where this 335 come from and also why sims[0][0] return other id than the tagged one in the yield section enter code here