Keras: Very high loss for Autoencoder - validation

I am trying to implement an autoencoder for prediction of multiple labels using Keras. This is a snippet:
input = Input(shape=(768,))
hidden1 = Dense(512, activation='relu')(input)
compressed = Dense(256, activation='relu', activity_regularizer=l1(10e-6))(hidden1)
hidden2 = Dense(512, activation='relu')(compressed)
output = Dense(768, activation='sigmoid')(hidden2) # sigmoid is used because output of autoencoder is a set of probabilities
model = Model(input, output)
model.compile(optimizer='adam', loss='categorical_crossentropy') # categorical_crossentropy is used because it's prediction of multiple labels
history = model.fit(x_train, x_train, epochs=100, batch_size=50, validation_split=0.2)
I ran this in Jupyter Notebook (CPU) and I am getting loss and validation loss as:
loss: 193.8085 - val_loss: 439.7132
but when I ran it in Google Colab (GPU), I am getting very high loss and validation loss:
loss: 28383285849773932.0000 - val_loss: 26927464965996544.0000.
What could be the reason for this behavior?

Related

Binary Image Classification - Validation loss is much higher than training loss

I´m facing a strange behaviour which I can´t figure out why it is happening. I´m getting a really high loss(BinaryCrossentropy) on my validation batch around 20 or even higher while training. But after the training I do a prediction on the tet set and I get a loss which is lower than 1. Why is that? I went through my code over and over and can´t find the problem.
I´m doing a binary image classification for brian tumors on a dataset provided via kaggle(Link.
And you can find my notebook here: Google-Colab Notebook
My data is loaded this way:
batch_size = 20
train_ds = tf.keras.utils.image_dataset_from_directory(
train_data_path,
subset='training',
seed=42,
color_mode='grayscale',
batch_size=batch_size,
validation_split=0.30
)
valid_ds = tf.keras.utils.image_dataset_from_directory(
train_data_path,
subset='validation',
seed=42,
batch_size=batch_size,
color_mode='grayscale',
validation_split=0.30
)
test_ds = tf.keras.utils.image_dataset_from_directory(
test_data_path,
color_mode='grayscale',
batch_size=batch_size,
shuffle=False
)
This is my modle strcuture
input_shape = image_batch[0].shape
# set up the model structure
model = tf.keras.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.3),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Dropout(0.3),
layers.Flatten(),
tf.keras.layers.Dense(32, activation="relu"),
layers.Dropout(0.3),
layers.Dense(1, activation="sigmoid")
])
model.summary()
This is my callback function which returns the plots during training:
class PlotLearning(tf.keras.callbacks.Callback):
"""
Callback to plot the learning curves of the model during training.
"""
def on_train_begin(self, logs={}):
self.metrics = {}
for metric in logs:
self.metrics[metric] = []
def on_epoch_end(self, epoch, logs={}):
# Storing metrics
print(logs)
for metric in logs:
if metric in self.metrics:
self.metrics[metric].append(logs.get(metric))
else:
self.metrics[metric] = [logs.get(metric)]
# Plotting
metrics = [x for x in logs if 'val' not in x]
f, axs = plt.subplots(1, len(metrics), figsize=(15,5))
clear_output(wait=True)
for i, metric in enumerate(metrics):
axs[i].plot(range(1, epoch + 2),
self.metrics[metric],
label=metric)
if logs['val_' + metric]:
axs[i].plot(range(1, epoch + 2),
self.metrics['val_' + metric],
label='val_' + metric)
axs[i].legend()
axs[i].grid()
plt.tight_layout()
plt.show()
callbacks_list = [PlotLearning()]
and this is the part where I start the training
# compile model
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
metrics=['accuracy']
)
# fit model
history = model.fit(prep_train_ds,
epochs=30,
validation_data=valid_ds,
callbacks=callbacks_list)
This is the output of the callback function after the last epoch run through:
As you can see the loss is really high and oscillating around 20, so I guess it is overfitting.
But as mentiod above, here is what I get when I make a prediction on the test set and calculate the binary crossentropy. The loss is again less than 1 and at least in the range of the training loss
I tried so many things like, chaning batch size, bcs. not enough samples of one class might be in one batch. Then I wanted to see if it is overfitting and changed the number of filters, applyed droput etc. But I couldn´t get the loss function down on the validation set. I´m quite new in the field of image classification and maybe I´m oversseing a thing.

Why is my model giving different results with the same parameters?

I am tuning my model with these parameters:
param_grid = {
'unit_1': 128,
'unit_2': 64,
'lr': 3e-4,
'activ': 'relu',
'epochs': 400,
'batch_size': 8}
Which is to say, I already know the parameters I'll be using, but I run these through the search function anyway, like so:
kgs = KerasGridSearch(build_classifier_model, param_grid,
monitor='val_loss', greater_is_better=False)
kgs.search({'text': x_train_text, 'vector': x_train_vector}, y_train,
sample_weight=y_train_weight,
batch_size=param_grid['batch_size'],
validation_data = ({'text': x_val_text, 'vector': x_val_vector},
y_val,y_val_weight),
callbacks=[es])
I then evaluate the model that results from this search and get some results:
Loss: 1.5514
Accuracy: 0.6601
Weighted accuracy: 0.6879
However, when I try to train a new model using the same parameters, this time with the fit() function, I get very different results.
history = classifier_model.fit({'text': x_train_text, 'vector': x_train_vector},
y_train,
sample_weight=y_train_weight,
steps_per_epoch=len(x_train_text) // best_params['batch_size'],
validation_data=({'text': x_val_text,
'vector': x_val_vector},
y_val,y_val_weight),
batch_size = best_params['batch_size'],
epochs=best_params['epochs']+10,
callbacks=[es])
Loss: 1.0226
Accuracy: 0.4915
Weighted accuracy: 0.5092
I have run the notebook repeatedly and the same thing happens. It is not making sense to me. I hope it isn't something obvious I'm missing. Please let me know if there is more information I can include to help solve this issue.

Saving bert model at every epoch for further training

I am using bert_model.save_pretrained for saving the model at end as this is the command that helps in saving the model with all configurations and weights but this cannot be used in model.fit command as in callbacks saving model at each epoch does not save with save_pretrained. Can anybody help me in saving bert model at each epoch since i cannot train whole bert model in one go?
Edit
Code for loading pre trained bert model
bert_model = TFAutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=num_classes)
Code for compiling the bert model
from tensorflow.keras import optimizers
bert_model.compile(loss='categorical_crossentropy',
optimizer=optimizers.Adam(learning_rate=0.00005),
metrics=['accuracy'])
bert_model.summary()
Code for training and saving the bert model
checkpoint_filepath_1 = 'callbacks_models/BERT1.{epoch:02d}-
{val_loss:.2f}.h5'
checkpoint_filepath_2 = 'callbacks_models/complete_best_BERT_model_1.h5'
callbacks_1 = ModelCheckpoint(
filepath=checkpoint_filepath_1,
monitor='val_loss',
mode='min',
save_best_only=False,
save_weights_only=False,
save_freq='epoch')
callbacks_2 = ModelCheckpoint(
filepath=checkpoint_filepath_2,
monitor='val_loss',
mode='min',
save_best_only=True)
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1,
patience=5)
hist = bert_model.fit([train1_input_ids, train1_attention_masks],
y_train1, batch_size=16, epochs=1,validation_data=
([val_input_ids, val_attention_masks], y_val),
callbacks
[es,callbacks_1,callbacks_2,history_logger])
min_val_score = min(hist.history['val_loss'])
print ("\nMinimum validation loss = ", min_val_score)
bert_model.save_pretrained("callbacks_models/Complete_BERT_model_1.h5")

Keras validation loss and metric inconsistent

RNN model used for regression, cf. Chollet, Deep Learning with Python, 6.3.1 A temperature-forecasting problem
In this example I used random data, both regressors and regressand
I have used the mean absolute error, both as loss function and as a metric
I do not understand the values I get for val_loss and val_mean_absolute_error. Neither of them make sense to me.
code:
import tensorflow as tf
import numpy as np
from keras.models import Sequential
from keras import layers
from keras.optimizers import Adam
import keras
I use random input data:
data_np = np.random.rand(6400,10)
target_np = np.random.rand(6400,)
Normalizing the data:
mean1 = data_np[:].mean(axis=0)
std1 = data_np[:].std(axis=0)
data_np -= mean1
data_np /= std1
mean2 = target_np.mean(axis=0)
std2 = target_np.std(axis=0)
target_np -= mean2
target_np /= std2
Create RNN input with lookback:
lookback = 7
train_data = np.array([data_np[(i-lookback):i,:] for i in range(lookback,len(data_np))])
target_data = target_np[lookback:len(data_np)]
And then set up a simple RNN:
model = Sequential()
model.add(layers.SimpleRNN(64,
activation = 'relu',
return_sequences=False,
input_shape=(train_data.shape[1], train_data.shape[2])))
model.add(layers.Dense(1))
opt = Adam(learning_rate=0.1)
mae = tensorflow.keras.losses.MeanAbsoluteError()
model.compile(optimizer=opt, loss=mae, metrics=[mae])
history = model.fit(train_data, target_data,
steps_per_epoch=round(0.7*len(train_data))//64,
epochs=10,
shuffle=False,
validation_split=0.3,
validation_steps = round(0.3*len(train_data))//64,
verbose=1)
The output then looks like this:
Train on 3495 samples, validate on 1498 samples Epoch 1/10 54/54
[==============================] - 2s 38ms/step - loss: 0.7955 -
mean_absolute_error: 0.7955 - val_loss: 0.0428 -
val_mean_absolute_error: 22.6301 Epoch 2/10 54/54
[==============================] - 2s 30ms/step - loss: 0.7152 -
mean_absolute_error: 0.7152 - val_loss: 0.0421 -
val_mean_absolute_error: 22.2968
I would expect val_loss and val_mean_absolute_error to be the same. Moreover, the levels don't make much sense either. After 10 epochs, I get
Epoch 10/10 54/54 [==============================] - 2s 32ms/step -
loss: 0.7747 - mean_absolute_error: 0.7747 - val_loss: 0.0409 -
val_mean_absolute_error: 21.6337
If I calculate the mean absolute error manually:
N=len(data_np)
val_data = np.array([data_np[(i-lookback):i,:] for i in range(round(0.7*N),N)])
val_target = target_np[round(0.7*N):N]
model_output = model.predict(val_data)
model_output=[output[0] for output in model_output]
np.mean(abs(model_output-val_target))
0.940300949276649
This looks like a result that one could expect. However, it is not even close to either val_loss or val_mean_absolute_error. What is wrong here?
OK. I managed to solve the issue by consistently using tensorflow.keras. So, replacing
import tensorflow as tf
import numpy as np
from keras.models import Sequential
from keras import layers
from keras.optimizers import Adam
import keras
with
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam
import tensorflow.keras
(and corrected a couple of details in the original question)

LSTM - LSTM - future value prediction error

After some research, I was able to predict the future value using the LSTM code below. I have also attached the Dmd1ahr.csv file in the github link that I am using.
https://github.com/ukeshchawal/hello-world/blob/master/Dmd1ahr.csv
As you all can see below, 90 data points are training sets and 91st to 100th are future value prediction.
However some of the questions that I still have are:
In order to predict these values I had to originally take more than hundred data sets (here, I have taken 500 data sets) which is not exactly what my primary goal is. Is there a way that given 500 data sets, it will predict the rest 10 or 20 out of sample data points? If yes, will you please write me a sample code where you can just take 500 data points from Dmd1ahr.csv file attached below and it will predict some future values (say 501 to 520) based on those 500 points?
The prediction are way off compared to the one who have in your blogs (definitely indicates for parameter tuning - I tried changing epochs, LSTM layers, Activation, optimizer). What other parameter tuning I can do to make it more robust?
Thank you'll in advance.
import numpy as np
import matplotlib.pyplot as plt
import pandas
# By twaking the architecture it could be made more robust
np.random.seed(7)
numOfSamples = 500
lengthTrain = 90
lengthValidation = 100
look_back = 1 # Can be set higher, in my experiments it made performance worse though
transientTime = 90 # Time to "burn in" time series
series = pandas.read_csv('Dmd1ahr.csv')
def generateTrainData(series, i, look_back):
return series[i:look_back+i+1]
trainX = np.stack([generateTrainData(series, i, look_back) for i in range(lengthTrain)])
testX = np.stack([generateTrainData(series, lengthTrain + i, look_back) for i in range(lengthValidation)])
trainX = trainX.reshape((lengthTrain,look_back+1,1))
testX = testX.reshape((lengthValidation, look_back + 1, 1))
trainY = trainX[:,1:,:]
trainX = trainX[:,:-1,:]
testY = testX[:,1:,:]
testX = testX[:,:-1,:]
############### Build Model ###############
import keras
from keras.models import Model
from keras import layers
from keras import regularizers
inputs = layers.Input(batch_shape=(1,look_back,1), name="main_input")
inputsAux = layers.Input(batch_shape=(1,look_back,1), name="aux_input")
# this layer makes the actual prediction, i.e. decides if and how much it goes up or down
x = layers.recurrent.LSTM(300,return_sequences=True, stateful=True)(inputs)
x = layers.recurrent.LSTM(200,return_sequences=True, stateful=True)(inputs)
x = layers.recurrent.LSTM(100,return_sequences=True, stateful=True)(inputs)
x = layers.recurrent.LSTM(50,return_sequences=True, stateful=True)(inputs)
x = layers.wrappers.TimeDistributed(layers.Dense(1, activation="linear",
kernel_regularizer=regularizers.l2(0.005),
activity_regularizer=regularizers.l1(0.005)))(x)
# auxillary input, the current input will be feed directly to the output
# this way the prediction from the step before will be used as a "base", and the Network just have to
# learn if it goes a little up or down
auxX = layers.wrappers.TimeDistributed(layers.Dense(1,
kernel_initializer=keras.initializers.Constant(value=1),
bias_initializer='zeros',
input_shape=(1,1), activation="linear", trainable=False
))(inputsAux)
outputs = layers.add([x, auxX], name="main_output")
model = Model(inputs=[inputs, inputsAux], outputs=outputs)
model.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['mean_squared_error'])
#model.summary()
#model.fit({"main_input": trainX, "aux_input": trainX[look_back-1,look_back,:]},{"main_output": trainY}, epochs=4, batch_size=1, shuffle=False)
model.fit({"main_input": trainX, "aux_input": trainX[:,look_back-1,:].reshape(lengthTrain,1,1)},{"main_output": trainY}, epochs=100, batch_size=1, shuffle=False)
############### make predictions ###############
burnedInPredictions = np.zeros(transientTime)
testPredictions = np.zeros(len(testX))
# burn series in, here use first transitionTime number of samples from test data
for i in range(transientTime):
prediction = model.predict([np.array(testX[i, :, 0].reshape(1, look_back, 1)), np.array(testX[i, look_back - 1, 0].reshape(1, 1, 1))])
testPredictions[i] = prediction[0,0,0]
burnedInPredictions[:] = testPredictions[:transientTime]
# prediction, now dont use any previous data whatsoever anymore, network just has to run on its own output
for i in range(transientTime, len(testX)):
prediction = model.predict([prediction, prediction])
testPredictions[i] = prediction[0,0,0]
# for plotting reasons
testPredictions[:np.size(burnedInPredictions)-1] = np.nan
############### plot results ###############
#import matplotlib.pyplot as plt
plt.plot(testX[:, 0, 0])
plt.show()
plt.plot(burnedInPredictions, label = "training")
plt.plot(testPredictions, label = "prediction")
plt.legend()
plt.show()

Resources