Overfitting in my Convolutional Neural Network Model - image

I have built a CNN model with the CUB-200-2011 dataset. But my model accuracy and validation accuracy have a huge different which my accuracy have 0.6 above and my validation accuracy only have 0.2.
I have tried to add data augmentation and tried with different model architecture. I have no idea why this situation occurs.
#get train dataset
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
path,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
#get validation dataset
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
path,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
# Create data augmentation layer
data_augmentation = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.RandomFlip("horizontal"),
tf.keras.layers.experimental.preprocessing.RandomRotation(0.1),
tf.keras.layers.experimental.preprocessing.RandomZoom(0.2)
])
#model architecture
model = tf.keras.models.Sequential([
data_augmentation,
tf.keras.layers.experimental.preprocessing.Rescaling(1./255),
tf.keras.layers.Conv2D(32, 3, activation="relu", input_shape=(img_height, img_width, 3)),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(64, 3, activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Conv2D(128, 3, activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation="relu"),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(200, activation="softmax")
])

Related

Classification Image, Tensorflow, Accuracy stuck in 10%

i'm actually doing the CINIC-10 Classification image challenge for my IT studies.
i never had DeepLearning experience before so i learnt it with some youtube videos.
I first tried the MNIST dataset for handwritting numbers and i had a great experience from it.
My model had a 92% chance of prediction and it worked great.
now i'm Trying to classify some images and even when i use different models from Keras my training model don't go above 10% of accuracy.
here's how i proceeded :
First i'm loading my Dasasets i have a train dataset and a validation dataset.
# loading in the data
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
cinic_directory_train,
validation_split=0.2,
subset="training",
seed=123,
image_size=(32, 32),
batch_size=16
)
validation_ds = tf.keras.preprocessing.image_dataset_from_directory(
cinic_directory_train,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(32, 32),
batch_size=16
)
with that i can get my clases names
class_names= train_ds.class_names
print(class_names)
Output :
['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'\]
and this is my model construction :
model = keras.Sequential([
keras.layers.experimental.preprocessing.Rescaling(1./255),
keras.layers.Conv2D(16, 3, padding='same', activation='relu'),
keras.layers.MaxPooling2D(),
keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
keras.layers.MaxPooling2D(),
keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
keras.layers.MaxPooling2D(),
keras.layers.Dropout(0.2),
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
model.compile(
optimizer='adam', #Fonction d'optimisation
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
And when i start the train session
history = model.fit(
train_ds,
validation_data=validation_ds,
epochs=3
)
my Accuracy is stuck between 0.09 and 0.10
I even tested my friends code and i keep getting the same accuracy beside they get like 30-50% of accuracy.
I'm using google Collabs for this.
I tried all those model and i keep getting alow accuracy :
VVG16 => 9%
Resnet50 => 9%
DenseNEt => 8%
EfficientNet => 2%
MobileNet => 9%
I can't find my problem and how to fix it!
you final layer should be
keras.layers.Dense(10. activation='softmax')

Binary Image Classification - Validation loss is much higher than training loss

I´m facing a strange behaviour which I can´t figure out why it is happening. I´m getting a really high loss(BinaryCrossentropy) on my validation batch around 20 or even higher while training. But after the training I do a prediction on the tet set and I get a loss which is lower than 1. Why is that? I went through my code over and over and can´t find the problem.
I´m doing a binary image classification for brian tumors on a dataset provided via kaggle(Link.
And you can find my notebook here: Google-Colab Notebook
My data is loaded this way:
batch_size = 20
train_ds = tf.keras.utils.image_dataset_from_directory(
train_data_path,
subset='training',
seed=42,
color_mode='grayscale',
batch_size=batch_size,
validation_split=0.30
)
valid_ds = tf.keras.utils.image_dataset_from_directory(
train_data_path,
subset='validation',
seed=42,
batch_size=batch_size,
color_mode='grayscale',
validation_split=0.30
)
test_ds = tf.keras.utils.image_dataset_from_directory(
test_data_path,
color_mode='grayscale',
batch_size=batch_size,
shuffle=False
)
This is my modle strcuture
input_shape = image_batch[0].shape
# set up the model structure
model = tf.keras.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.3),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.3),
layers.Flatten(),
tf.keras.layers.Dense(32, activation="relu"),
layers.Dropout(0.3),
layers.Dense(1, activation="sigmoid")
])
model.summary()
This is my callback function which returns the plots during training:
class PlotLearning(tf.keras.callbacks.Callback):
"""
Callback to plot the learning curves of the model during training.
"""
def on_train_begin(self, logs={}):
self.metrics = {}
for metric in logs:
self.metrics[metric] = []
def on_epoch_end(self, epoch, logs={}):
# Storing metrics
print(logs)
for metric in logs:
if metric in self.metrics:
self.metrics[metric].append(logs.get(metric))
else:
self.metrics[metric] = [logs.get(metric)]
# Plotting
metrics = [x for x in logs if 'val' not in x]
f, axs = plt.subplots(1, len(metrics), figsize=(15,5))
clear_output(wait=True)
for i, metric in enumerate(metrics):
axs[i].plot(range(1, epoch + 2),
self.metrics[metric],
label=metric)
if logs['val_' + metric]:
axs[i].plot(range(1, epoch + 2),
self.metrics['val_' + metric],
label='val_' + metric)
axs[i].legend()
axs[i].grid()
plt.tight_layout()
plt.show()
callbacks_list = [PlotLearning()]
and this is the part where I start the training
# compile model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
metrics=['accuracy']
)
# fit model
history = model.fit(prep_train_ds,
epochs=30,
validation_data=valid_ds,
callbacks=callbacks_list)
This is the output of the callback function after the last epoch run through:
As you can see the loss is really high and oscillating around 20, so I guess it is overfitting.
But as mentiod above, here is what I get when I make a prediction on the test set and calculate the binary crossentropy. The loss is again less than 1 and at least in the range of the training loss
I tried so many things like, chaning batch size, bcs. not enough samples of one class might be in one batch. Then I wanted to see if it is overfitting and changed the number of filters, applyed droput etc. But I couldn´t get the loss function down on the validation set. I´m quite new in the field of image classification and maybe I´m oversseing a thing.

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

Saving bert model at every epoch for further training

I am using bert_model.save_pretrained for saving the model at end as this is the command that helps in saving the model with all configurations and weights but this cannot be used in model.fit command as in callbacks saving model at each epoch does not save with save_pretrained. Can anybody help me in saving bert model at each epoch since i cannot train whole bert model in one go?
Edit
Code for loading pre trained bert model
bert_model = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_classes)
Code for compiling the bert model
from tensorflow.keras import optimizers
bert_model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(learning_rate=0.00005),
metrics=['accuracy'])
bert_model.summary()
Code for training and saving the bert model
checkpoint_filepath_1 = 'callbacks_models/BERT1.{epoch:02d}-
{val_loss:.2f}.h5'
checkpoint_filepath_2 = 'callbacks_models/complete_best_BERT_model_1.h5'
callbacks_1 = ModelCheckpoint(
filepath=checkpoint_filepath_1,
monitor='val_loss',
mode='min',
save_best_only=False,
save_weights_only=False,
save_freq='epoch')
callbacks_2 = ModelCheckpoint(
filepath=checkpoint_filepath_2,
monitor='val_loss',
mode='min',
save_best_only=True)
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,
patience=5)
hist = bert_model.fit([train1_input_ids, train1_attention_masks],
y_train1, batch_size=16, epochs=1,validation_data=
([val_input_ids, val_attention_masks], y_val),
callbacks
[es,callbacks_1,callbacks_2,history_logger])
min_val_score = min(hist.history['val_loss'])
print ("\nMinimum validation loss = ", min_val_score)
bert_model.save_pretrained("callbacks_models/Complete_BERT_model_1.h5")

Good training and validation accuracy but poor confusion matrix

I have training my model to detect normal vs pneumonia chest x-ray classes. This is my dataset as listed below:
train_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input)\
.flow_from_directory(directory=train_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'],
batch_size=32,class_mode='categorical')
val_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=val_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'], batch_size=32, class_mode='categorical')
test_batch= ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input) \
.flow_from_directory(directory=test_path, target_size=(224,224), classes=['NORMAL', 'PNEUMONIA'], batch_size=16,class_mode='categorical', shuffle=False)
Found 3616 images belonging to 2 classes. #training
Found 1616 images belonging to 2 classes. #validation
Found 624 images belonging to 2 classes. #test
my model consist of 5 CNN layers where image w,h = (224* 224,3) with 16 feature map as first layer and then 32, 64, 128,256. Batch normalization , max pooling and dropout is added to every cnn layer, but last dense layer is as follow
model.add(Dense(units=2 , activation='softmax'))
optim = Adam( lr=0.001 )
model.compile(optimizer=optim , loss= 'categorical_crossentropy' , metrics= ['accuracy'])
history=model.fit_generator(train_batch,
steps_per_epoch= 113, #3616/32=113
epochs = 25,
validation_data = val_batch,
validation_steps = 51 #1616/32=51
#verbose=2
#callbacks=callbacks #remove to chk
)
as it can be seen in the graph that my training and validation accuracy and loss is good but when I plot confusion matrix it dose not seems good why??
prediction = model.predict_generator(test_batch,steps= stepss) #, verbose=0)
prediction1 = np.argmax(prediction, axis=1)
cm = confusion_matrix (test_batch.classes, prediction1)
print(cm)
this is my confusion matrix as below
as you can see my graph which is as below
after that I did fine tuning of my model with VGG!6 by replacing last dense layer with my own dense layer with two outputs and here is the graph and confusion matrix:
I do not understand why my testing in not going good even with vgg16 model as you can see the results so please give me your valuable suggestions THANKS

Resources