Why is my CNN not learning - debugging

I am sorry for such a cliche question, but I really don't know why my CNN is not improving.
I am training a CNN for SVHN dataset (single digit) with images of 32x32.
For preprocessing, I transform RGB to grayscale and normalize all pixel data by standardization. So the data range becomes (-1,1). To verify that my X and y correspond to each other correctly, I randomly pick an image from X and a label from y with the same index, and it shows that they do.
Here's my code (Keras, tensorflow backend):
"""
Single Digit Recognition
"""
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Activation, Convolution2D
from keras.layers.pooling import MaxPooling2D
from keras.optimizers import SGD
from keras.layers.core import Dropout, Flatten
model = Sequential()
model.add(Convolution2D(16, 5, 5, border_mode='same', input_shape=(32, 32, 1)))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(MaxPooling2D(pool_size=(2, 2), strides=None, border_mode='same', dim_ordering='default'))
model.add(Convolution2D(32, 5, 5, border_mode='same', input_shape=(16, 16, 16)))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(MaxPooling2D(pool_size=(2, 2), strides=None, border_mode='same', dim_ordering='default'))
model.add(Convolution2D(64, 5, 5, border_mode='same', input_shape=(32, 8, 8)))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(MaxPooling2D(pool_size=(2, 2), strides=None, border_mode='same', dim_ordering='default'))
model.add(Flatten())
model.add(Dense(128, input_dim=1024))
model.add(Activation("relu"))
model.add(Dense(10, input_dim=128))
model.add(Activation('softmax'))
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
model.fit(train_X, train_y,
validation_split=0.1,
nb_epoch=20,
batch_size=64)
score = model.evaluate(test_X, test_y, batch_size=16)
After running 10 epochs, the accuracy is still the same as in the first epoch, and that's why I stopped it.
Train on 65931 samples, validate on 7326 samples
Epoch 1/20
65931/65931 [==============================] - 190s - loss: 2.2390 - acc: 0.1882 - val_loss: 2.2447 - val_acc: 0.1885
Epoch 2/20
65931/65931 [==============================] - 194s - loss: 2.2395 - acc: 0.1893 - val_loss: 2.2399 - val_acc: 0.1885
Epoch 3/20
65931/65931 [==============================] - 167s - loss: 2.2393 - acc: 0.1893 - val_loss: 2.2402 - val_acc: 0.1885
Epoch 4/20
65931/65931 [==============================] - 172s - loss: 2.2394 - acc: 0.1883 - val_loss: 2.2443 - val_acc: 0.1885
Epoch 5/20
65931/65931 [==============================] - 172s - loss: 2.2393 - acc: 0.1884 - val_loss: 2.2443 - val_acc: 0.1885
Epoch 6/20
65931/65931 [==============================] - 179s - loss: 2.2397 - acc: 0.1881 - val_loss: 2.2433 - val_acc: 0.1885
Epoch 7/20
65931/65931 [==============================] - 173s - loss: 2.2399 - acc: 0.1888 - val_loss: 2.2410 - val_acc: 0.1885
Epoch 8/20
65931/65931 [==============================] - 175s - loss: 2.2392 - acc: 0.1893 - val_loss: 2.2439 - val_acc: 0.1885
Epoch 9/20
65931/65931 [==============================] - 175s - loss: 2.2395 - acc: 0.1893 - val_loss: 2.2401 - val_acc: 0.1885
Epoch 10/20
9536/65931 [===>..........................] - ETA: 162s - loss: 2.2372 - acc: 0.1909
Should I keep trying with more patience or is there something wrong with my CNN?

Try switching your optimizer to Adam, as it is more capable than SGD. You can include Nesterov momentum with nAdam. So i would try the following.
model.compile(loss='categorical_crossentropy',
optimizer='nadam',
metrics=['accuracy'])
This will adjust learning rates automatically and you don't need to worry about it as much.

Related

Autokeras StructuredDataClassifier fails after a few trials

I'm using StructuredDataClassifier to train a model and I encounter the following error after a few trials.
Trial 3 Complete \[00h 00m 23s\]
val_accuracy: 0.9289383292198181
Best val_accuracy So Far: 0.9289383292198181
Total elapsed time: 00h 01m 02s
Search: Running Trial #4
Value |Best Value So Far |Hyperparameter
True |True |structured_data_block_1/normalize
False |False |structured_data_block_1/dense_block_1/use_batchnorm
2 |2 |structured_data_block_1/dense_block_1/num_layers
32 |32 |structured_data_block_1/dense_block_1/units_0
0 |0 |structured_data_block_1/dense_block_1/dropout
32 |32 |structured_data_block_1/dense_block_1/units_1
0 |0 |classification_head_1/dropout
adam |adam |optimizer
0\.01 |0.001 |learning_rate
Epoch 1/1000
148/148 \[==============================\] - 2s 9ms/step - loss: 0.1917 - accuracy: 0.9576 - val_loss: 0.5483 - val_accuracy: 0.9289
Epoch 2/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1572 - accuracy: 0.9628 - val_loss: 0.3410 - val_accuracy: 0.9289
Epoch 3/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1434 - accuracy: 0.9628 - val_loss: 0.3330 - val_accuracy: 0.9289
Epoch 4/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1414 - accuracy: 0.9628 - val_loss: 0.3014 - val_accuracy: 0.9289
Epoch 5/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1395 - accuracy: 0.9628 - val_loss: 0.3012 - val_accuracy: 0.9289
Epoch 6/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1334 - accuracy: 0.9628 - val_loss: 0.4439 - val_accuracy: 0.9289
Epoch 7/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1370 - accuracy: 0.9628 - val_loss: 0.2964 - val_accuracy: 0.9289
Epoch 8/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1309 - accuracy: 0.9628 - val_loss: 0.2949 - val_accuracy: 0.9289
Epoch 9/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1282 - accuracy: 0.9628 - val_loss: 0.2927 - val_accuracy: 0.9289
Epoch 10/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1301 - accuracy: 0.9628 - val_loss: 0.2937 - val_accuracy: 0.9289
Epoch 11/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1278 - accuracy: 0.9628 - val_loss: 0.3152 - val_accuracy: 0.9289
Epoch 12/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1270 - accuracy: 0.9628 - val_loss: 0.3062 - val_accuracy: 0.9289
Epoch 13/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1286 - accuracy: 0.9628 - val_loss: 0.3198 - val_accuracy: 0.9289
Epoch 14/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1268 - accuracy: 0.9628 - val_loss: 0.3318 - val_accuracy: 0.9289
Epoch 15/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1244 - accuracy: 0.9628 - val_loss: 0.3038 - val_accuracy: 0.9289
Epoch 16/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1239 - accuracy: 0.9628 - val_loss: 0.3050 - val_accuracy: 0.9289
Epoch 17/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1222 - accuracy: 0.9628 - val_loss: 0.3180 - val_accuracy: 0.9289
Epoch 18/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1239 - accuracy: 0.9628 - val_loss: 0.3298 - val_accuracy: 0.9289
Epoch 19/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1220 - accuracy: 0.9628 - val_loss: 0.2916 - val_accuracy: 0.9289
Epoch 20/1000
148/148 \[==============================\] - 1s 8ms/step - loss: 0.1203 - accuracy: 0.9630 - val_loss: 0.3548 - val_accuracy: 0.9289
Epoch 21/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1243 - accuracy: 0.9628 - val_loss: 0.3047 - val_accuracy: 0.9289
Epoch 22/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1208 - accuracy: 0.9633 - val_loss: 0.4035 - val_accuracy: 0.9289
Epoch 23/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1242 - accuracy: 0.9628 - val_loss: 0.3383 - val_accuracy: 0.9289
Epoch 24/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1181 - accuracy: 0.9635 - val_loss: 0.3576 - val_accuracy: 0.9289
Epoch 25/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1171 - accuracy: 0.9641 - val_loss: 0.3221 - val_accuracy: 0.9289
Epoch 26/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1149 - accuracy: 0.9635 - val_loss: 0.3314 - val_accuracy: 0.9289
Epoch 27/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1136 - accuracy: 0.9635 - val_loss: 0.3554 - val_accuracy: 0.9289
Epoch 28/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1196 - accuracy: 0.9633 - val_loss: 0.3311 - val_accuracy: 0.9289
Epoch 29/1000
148/148 [==============================] - 1s 8ms/step - loss: 0.1176 - accuracy: 0.9635 - val_loss: 0.3684 - val_accuracy: 0.9289
Trial 4 Complete [00h 00m 36s]
val_accuracy: 0.9289383292198181
Best val_accuracy So Far: 0.9289383292198181
Total elapsed time: 00h 01m 37s
Search: Running Trial #5
Value |Best Value So Far |Hyperparameter
True |True |structured_data_block_1/normalize
False |False |structured_data_block_1/dense_block_1/use_batchnorm
2 |2 |structured_data_block_1/dense_block_1/num_layers
32 |32 |structured_data_block_1/dense_block_1/units_0
0 |0 |structured_data_block_1/dense_block_1/dropout
32 |32 |structured_data_block_1/dense_block_1/units_1
0 |0 |classification_head_1/dropout
adam_weight_decay |adam |optimizer
0.001 |0.001 |learning_rate
Epoch 1/1000
2022-12-11 16:22:23.607384: W tensorflow/core/framework/op_kernel.cc:1807] OP_REQUIRES failed at cast_op.cc:121 : UNIMPLEMENTED: Cast string to
float is not supported
2022-12-11 16:22:23.607506: W tensorflow/core/framework/op_kernel.cc:1807] OP_REQUIRES failed at cast_op.cc:121 : UNIMPLEMENTED: Cast string to
float is not supported
Traceback (most recent call last):
File "/home/anand/automl/automl.py", line 30, in <module>
clf.fit(x=X_train, y=y_train, use_multiprocessing=True, workers=8, verbose=True)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 326, in fit
history = super().fit(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 139, in fit
history = super().fit(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/auto_model.py", line 292, in fit
history = self.tuner.search(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/engine/tuner.py", line 193, in search
super().search(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras_tuner/engine/base_tuner.py", line 183, in search
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras_tuner/engine/tuner.py", line 295, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/engine/tuner.py", line 101, in _build_and_fit_model
_, history = utils.fit_with_adaptive_batch_size(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/utils/utils.py", line 88, in fit_with_adaptive_batch_size
history = run_with_adaptive_batch_size(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/utils/utils.py", line 89, in <lambda>
batch_size, lambda **kwargs: model.fit(**kwargs), **fit_kwargs
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/anand/automl/.venv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:
Detected at node 'Cast_1' defined at (most recent call last):
File "/home/anand/automl/automl.py", line 30, in <module>
clf.fit(x=X_train, y=y_train, use_multiprocessing=True, workers=8, verbose=True)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 326, in fit
history = super().fit(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/tasks/structured_data.py", line 139, in fit
history = super().fit(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/auto_model.py", line 292, in fit
history = self.tuner.search(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/engine/tuner.py", line 193, in search
super().search(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras_tuner/engine/base_tuner.py", line 183, in search
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras_tuner/engine/tuner.py", line 295, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/engine/tuner.py", line 101, in _build_and_fit_model
_, history = utils.fit_with_adaptive_batch_size(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/utils/utils.py", line 88, in fit_with_adaptive_batch_size
history = run_with_adaptive_batch_size(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/utils/utils.py", line 101, in run_with_adaptive_batch_size
history = func(x=x, validation_data=validation_data, **fit_kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/utils/utils.py", line 89, in <lambda>
batch_size, lambda **kwargs: model.fit(**kwargs), **fit_kwargs
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/autokeras/keras_layers.py", line 360, in apply_gradients
return super(AdamWeightDecay, self).apply_gradients(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_grad
ients
return super().apply_gradients(grads_and_vars, name=name)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradi
ents
self._apply_weight_decay(trainable_variables)
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_wei
ght_decay
tf.__internal__.distribute.interim.maybe_merge_call(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distribute
d_apply_weight_decay
distribution.extended.update(
File "/home/anand/automl/.venv/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_dec
ay_fn
wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
2 root error(s) found.
(0) UNIMPLEMENTED: Cast string to float is not supported
[[{{node Cast_1}}]]
(1) CANCELLED: Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_70943]
This is my Python code
import tensorflow as tf
import pandas as pd
import numpy as np
import autokeras as ak
from sklearn.model_selection import LeaveOneGroupOut
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
data = pd.read_csv("p_feature_df.csv")
y = data.pop('is_p')
y = y.astype(np.int32)
data.pop('idx')
groups = data.pop('owner')
data = data.astype(np.float32)
X = data.to_numpy()
lb = LabelEncoder()
y = lb.fit_transform(y)
logo = LeaveOneGroupOut()
logo.get_n_splits(X,y,groups)
results = []
models = []
for train_index, test_index in logo.split(X,y,groups):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = ak.StructuredDataClassifier(overwrite=True)
clf.fit(x=X_train, y=y_train, use_multiprocessing=True, workers=8, verbose=True)
loss, acc = clf.evaluate(x=X_test, y=y_test, verbose=True)
results.append( (loss, acc))
models.append(clf)
print( (loss, acc) )`
The code fails when adam_weight_decay is used.
Same issue here
I think it's related to some download that autokeras made in colab and in local pc didn't
The file is:
https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50v2_weights_tf_dim_ordering_tf_kernels_notop.h5
I had the same problem. I solved it by creation of an AutoModel (Baseclass) like:
input_node = ak.StructuredDataInput()
output_node = ak.DenseBlock(use_batchnorm=True)(input_node)
output_node = ak.DenseBlock(dropout=0.1)(output_node)
output_node = ak.DenseBlock(use_batchnorm=True)(input_node)
output_node = ak.ClassificationHead()(output_node)
clf = ak.AutoModel(
inputs=input_node, outputs=output_node, overwrite=True )
It seems to be a bug in StructuredDataClassifier

Matplotlib Countour not Connected

As a Python novice and trying to visualize the curve X2*Y + X*Y2 - X4 - Y4 = 0 with Matplotlib:
from matplotlib.pyplot import *
from sympy import *
from numpy import *
delta = 0.025
p = arange(-0.5, 1.5, delta)
q = arange(-0.5, 1.5, delta)
X, Y = meshgrid(p, q)
Z = X**2*Y + X*Y**2 - X**4 - Y**4
fig, ax = subplots()
CS = ax.contour(X, Y, Z, [0], colors ='k')
ax.set_title('x**2*y + x*y**2 - x**4 - y**4')
show()
the result is that the plot is not connected, whereas mathematically, it should be so. How can the level set be connected?
It's a year later, but for future reference: You just have to choose a smaller stepsize delta. With your delta = 0.025 your get the disconnected picture:
With delta = 0.001 you get a more accurate connected picture:

Training did not improve the model performance on validation data

I trying to train my Resnet-50 network on a database which collects 5968 images for training and 1492 for validation (746 classes with 8 images/class for training and 2 images/class for validation). I used ImageDataGenerator flow_from_directory method to get labels from folders
My problem is that during the training, the accuracy of the training was increasing and the loss was decreasing which is good. In fact, the validation accuracy was very low (around 0.003) and there is no improvement. Also the validation loss is very high and still oscillating into very high values!!
Here is my code
import numpy as np
from keras_preprocessing.image import ImageDataGenerator
from keras.utils.vis_utils import plot_model
import resnet
import json
from keras.callbacks import ModelCheckpoint, EarlyStopping
import keras
import pydot as pyd
keras.utils.vis_utils.pydot = pyd
data_path_l =".\\TRAIN\\left_750\\"
test_data_path_l =".\\TEST\\left_750\\"
num_classes=746
train_images=5968
val_images=1492
batch_size=32
epochs=500
img_channels=3
img_rows=224
img_cols=224
input_imgen = ImageDataGenerator(shear_range = 0.2,
zoom_range = 0.2,
rotation_range=5.,
horizontal_flip = True)
valid_imgen = ImageDataGenerator()
train_it = input_imgen.flow_from_directory(directory=data_path_l,target_size=(img_rows,img_cols),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=False,
)
valid_it = valid_imgen.flow_from_directory(directory=test_data_path_l,target_size=(img_rows,img_cols),
color_mode="rgb",
batch_size=batch_size,
class_mode="categorical",
shuffle=False,
)
model = resnet.ResnetBuilder.build_resnet_50((img_channels, img_rows, img_cols), num_classes)
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
filepath=".\\conv2D_models\\model-{epoch:02d}-{loss:.4f}.hdf5"
mc = ModelCheckpoint(filepath, save_weights_only=False, verbose=1,
monitor='loss', mode='min')
history=model.fit_generator(train_it,
steps_per_epoch= train_images // batch_size,
validation_data = valid_it,
validation_steps = val_images // batch_size,
epochs = epochs,callbacks=[mc],
shuffle=False)
model.save('resnet2D_1sample.h5')
and here is a part of training epochs:
Epoch 00059: saving model to .\conv2D_models\model-59-3.6342.hdf5
Epoch 60/500
186/186 [==============================] - 262s 1s/step - loss: 3.6074 - acc: 0.4078 - val_loss: 12.1131 - val_acc: 0.0034
Epoch 00060: saving model to .\conv2D_models\model-60-3.6084.hdf5
Epoch 61/500
186/186 [==============================] - 276s 1s/step - loss: 3.5681 - acc: 0.4236 - val_loss: 12.0455 - val_acc: 0.0034
Epoch 00061: saving model to .\conv2D_models\model-61-3.5683.hdf5
Epoch 62/500
186/186 [==============================] - 100s 536ms/step - loss: 3.4684 - acc: 0.4415 - val_loss: 10.2444 - val_acc: 0.0068
Epoch 00062: saving model to .\conv2D_models\model-62-3.4674.hdf5
Epoch 63/500
186/186 [==============================] - 96s 516ms/step - loss: 3.4523 - acc: 0.4414 - val_loss: 11.6459 - val_acc: 0.0062
Epoch 00063: saving model to .\conv2D_models\model-63-3.4530.hdf5
Epoch 64/500
186/186 [==============================] - 96s 516ms/step - loss: 3.3837 - acc: 0.4782 - val_loss: 12.3293 - val_acc: 0.0062
Epoch 00064: saving model to .\conv2D_models\model-64-3.3847.hdf5
Epoch 65/500
186/186 [==============================] - 96s 515ms/step - loss: 3.2915 - acc: 0.5045 - val_loss: 12.8812 - val_acc: 0.0034
Epoch 00065: saving model to .\conv2D_models\model-65-3.2928.hdf5
Epoch 66/500
186/186 [==============================] - 96s 517ms/step - loss: 3.2506 - acc: 0.5129 - val_loss: 13.2886 - val_acc: 0.0034
Epoch 00066: saving model to .\conv2D_models\model-66-3.2527.hdf5
Epoch 67/500
186/186 [==============================] - 96s 515ms/step - loss: 3.2511 - acc: 0.5123 - val_loss: 14.4090 - val_acc: 0.0034
Epoch 00067: saving model to .\conv2D_models\model-67-3.2530.hdf5
Epoch 68/500
186/186 [==============================] - 97s 519ms/step - loss: 3.2632 - acc: 0.5163 - val_loss: 16.2364 - val_acc: 0.0027
Epoch 00068: saving model to .\conv2D_models\model-68-3.2650.hdf5
Epoch 69/500
186/186 [==============================] - 96s 517ms/step - loss: 3.1477 - acc: 0.5585 - val_loss: 16.2729 - val_acc: 0.0021
Epoch 00069: saving model to .\conv2D_models\tmodel-69-3.1487.hdf5
Epoch 70/500
186/186 [==============================] - 96s 516ms/step - loss: 2.9347 - acc: 0.6099 - val_loss: 16.7732 - val_acc: 0.0014
Epoch 00070: saving model to .\conv2D_models\model-70-2.9369.hdf5
Epoch 71/500
186/186 [==============================] - 96s 515ms/step - loss: 2.7118 - acc: 0.6715 - val_loss: 15.4640 - val_acc: 0.0075
Epoch 00071: saving model to .\conv2D_models\model-71-2.7134.hdf5
Epoch 72/500
186/186 [==============================] - 96s 517ms/step - loss: 2.6145 - acc: 0.6835 - val_loss: 16.2367 - val_acc: 0.0055
Epoch 00072: saving model to .\conv2D_models\model-72-2.6159.hdf5
Epoch 73/500
186/186 [==============================] - 96s 517ms/step - loss: 2.5492 - acc: 0.6816 - val_loss: 16.8155 - val_acc: 0.0000e+00
Epoch 00073: saving model to .\conv2D_models\model-73-2.5503.hdf5
Epoch 74/500
186/186 [==============================] - 96s 516ms/step - loss: 2.5743 - acc: 0.6786 - val_loss: 14.1867 - val_acc: 0.0021
Epoch 00074: saving model to .\conv2D_models\model-74-2.5759.hdf5
Epoch 75/500
186/186 [==============================] - 96s 516ms/step - loss: 2.5295 - acc: 0.6962 - val_loss: 12.3790 - val_acc: 0.0055
could anyone suggest to me some potential raisons that leads to this strange training behavior because it's been blocking me for a week..

single layer net in keras with imagedatagenerator, but loss is always negative

I have tried many kinds of net, but even in basic net(single layer), loss which set as binary_crossentropy is always negative
here is the code
from __future__ import print_function
import numpy as np
import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
import os
import cv2
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
train_path = 'D:/rectangle'
val_path = 'D:/rectang'
model = Sequential()
model.add(Conv2D(32, 1, 1, input_shape=(230, 230, 3)))
model.add(Flatten())
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
train_datagen = ImageDataGenerator(
samplewise_center=True,
samplewise_std_normalization=True)
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
train_path,
target_size=(230, 230),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
val_path,
target_size=(230, 230),
batch_size=32,
class_mode='binary')
model.fit_generator(
train_generator,
steps_per_epoch=200,
epochs=50,
validation_data=validation_generator,
nb_val_samples=800
)
here is the processing:
1/200 [..............................] -
ETA: 20:17 - loss: 12.9030 - acc: 0.1250
2/200 [..............................] -
ETA: 10:22 - loss: -2.0179 - acc: 0.0625
3/200 [..............................] -
ETA: 7:03 - loss: -6.3273 - acc: 0.0417
4/200 [..............................] -
ETA: 5:23 - loss: -7.8592 - acc: 0.0312
5/200 [..............................] -
ETA: 4:24 - loss: -8.6776 - acc: 0.0250
6/200 [..............................] -
ETA: 3:44 - loss: -9.5563 - acc: 0.0208
7/200 [>.............................] -
ETA: 3:15 - loss: -9.3298 - acc: 0.0179
8/200 [>.............................] -
ETA: 2:54 - loss: -9.3455 - acc: 0.0156
9/200 [>.............................] -
ETA: 2:37 - loss: -10.2439 - acc: 0.0139
10/200 [>.............................] -
ETA: 2:24 - loss: -10.5647 - acc: 0.0125
11/200 [>.............................] -
ETA: 2:13 - loss: -10.8719 - acc: 0.0114
12/200 [>.............................] -
ETA: 2:04 - loss: -11.3775 - acc: 0.0104
13/200 [>.............................] -
ETA: 1:56 - loss: -11.3066 - acc: 0.0096
14/200 [=>............................] -
ETA: 1:49 - loss: -11.4598 - acc: 0.0089
15/200 [=>............................] -
ETA: 1:48 - loss: -11.4930 - acc: 0.0083
16/200 [=>............................] -
ETA: 1:47 - loss: -11.6465 - acc: 0.0078
17/200 [=>............................] -
ETA: 1:51 - loss: -11.6061 - acc: 0.0074
the input image is the photo of breast cancer hispological images, with 460*460 size and 20000 pics in PNG format.
I would appreciate it if it will be solved!
Since you are doing a binary classification (based in your loss), your last activation function should be sigmoid. So
instead of
model.add(Dense(1))
your last layer should look like:
model.add(Dense(1,activation='sigmoid'))
Without specifying it, your activation will be just linear by default, which fits a regression senario rather than classification.

Checking validation results in Keras shows only 50% correct. Clearly random

I'm struggling with a, seemingly simple, problem. I can't figure out how to match my input images to the resulting probabilities produced by my model.
Training and Validation of my model (Vanilla VGG16, re-trainined for 2 classes, dogs and cats) are going fine, getting me close to 97% validation accuracy, but when I run the check to see what I got right and what I got wrong I only get random results.
Found 1087 correct labels (53.08%)
I am pretty sure it has something to do with the ImageDataGenerator which produces random batches on my validation images, although I DO set shuffle = false
I just save the filenames and classes of my generator before I run them and I ASSUME that the index of my filenames and classes is the same as the output of my probabilities.
Here's my setup (Vanilla VGG16, with last layer replaced to match 2 categories for cats and dogs)
new_model.summary()
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
Binary_predictions (Dense) (None, 2) 8194
=================================================================
Total params: 134,268,738
Trainable params: 8,194
Non-trainable params: 134,260,544
_________________________________________________________________
batch_size=16
epochs=3
learning_rate=0.01
This is the definition of the generators, for training and validation. I did not yet include the data augmentation part at this point.
train_datagen = ImageDataGenerator()
validation_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
train_path,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical')
train_filenames = train_generator.filenames
train_samples = len(train_filenames)
validation_generator = validation_datagen.flow_from_directory(
valid_path,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical',
shuffle = False) #Need this to be false, so I can extract the correct classes and filenames in order that that are predicted
validation_filenames = validation_generator.filenames
validation_samples = len(validation_filenames)
Finetuning the model goes fine
#Fine-tune the model
#DOC: fit_generator(generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None,
# validation_data=None, validation_steps=None, class_weight=None,
# max_queue_size=10, workers=1, use_multiprocessing=False, initial_epoch=0)
new_model.fit_generator(
train_generator,
steps_per_epoch=train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_samples // batch_size)
Epoch 1/3
1434/1434 [==============================] - 146s - loss: 0.5456 - acc: 0.9653 - val_loss: 0.5043 - val_acc: 0.9678
Epoch 2/3
1434/1434 [==============================] - 148s - loss: 0.5312 - acc: 0.9665 - val_loss: 0.4293 - val_acc: 0.9722
Epoch 3/3
1434/1434 [==============================] - 148s - loss: 0.5332 - acc: 0.9665 - val_loss: 0.4329 - val_acc: 0.9731
As is the extraction of the validation data
#We need the probabilities/scores for the validation set
#DOC: predict_generator(generator, steps, max_queue_size=10, workers=1,
# use_multiprocessing=False, verbose=0)
probs = new_model.predict_generator(
validation_generator,
steps=validation_samples // batch_size,
verbose = 1)
#Extracting the probabilities and labels
our_predictions = probs[:,0]
our_labels = np.round(1-our_predictions)
expected_labels = validation_generator.classes
Now, when I calculate the success of my validation set by comparing the expected labels and the calculated labels, I get something that is suspiciously close to random:
correct = np.where(our_labels==expected_labels)[0]
print("Found {:3d} correct labels ({:.2f}%)".format(len(correct),
100*len(correct)/len(our_predictions)))
Found 1087 correct labels (53.08%)
Clearly this is not correct.
I suspect this is something to do with the randomness of the Generators, but I set shuffle = False.
This code was DIRECTLY copied from the Fast.ai course by the great Jeremy Howard, but I can't get it to work anymore..
I am using Keras 2.0.8 and TensorFlow 1.3 backend on Python 3.5 under Anaconda...
Please help me retain my sanity!
You need to call validation_generator.reset() in between fit_generator() and predict_generator().
In *_generator() functions, data batches are inserted into a queue before being used to fit/evaluate the model. The underlying queue is always kept full, so there will be some extra batches in the queue when training ends. You can verify it by printing validation_generator.batch_index after training. Therefore, your predict_generator() does not start with the first batch, and probs[0] is not the prediction of the first image. That's why our_labels does not align with expected_labels and the accuracy is low.
BTW, you should use validation_steps=validation_samples // batch_size + 1 (also for the training generator). Unless validation_samples is a multiple of batch_size, you're ignoring one batch in each epoch if you use validation_steps=validation_samples // batch_size, and your model is evaluated on a (slightly) different dataset in each epoch.
I met a similar problem before, I think predict_generator() is not friendly, so I write a function to test the data set.
Here is my code snippet:
from PIL import Image
import numpy as np
import json
def get_img_result(img_path):
image = Image.open(img_path)
image.load()
image = image.resize((img_width, img_height))
if image.mode is not 'RGB':
image = image.convert('RGB')
array = np.asarray(image, dtype='int32')
array = array / 255
array = np.asarray([array])
result = new_model.predict(array)
print(result)
return result
# path: the root folder of the validation data set. validation->cat->kitty.jpg
def validate(path):
result_list = []
right_count = 0
wrong_count = 0
categories = os.listdir(path)
for i in range(len(categories)):
images = os.listdir(os.path.join(path, categories[i]))
for image in images:
result = get_img_result(os.path.join(path, categories[i], image))[0]
if result[i] != max(result):
result_list.append({'image': image, 'category': categories[i], 'score': result.tolist(), 'right': 0})
wrong_count = wrong_count + 1
else:
result_list.append({'image': image, 'category': categories[i], 'score': result.tolist(), 'right': 1})
right_count = right_count + 1
json_string = json.dumps(result_list)
with open('result.json', 'w') as f:
f.write(json_string)
print('right count : {0} \n wrong count : {1} \n accuracy : {2}'.format(right_count, wrong_count,
(right_count) / (
right_count + wrong_count)))
I use PIL convert image to numpy array as Keras do, I test all images and save the result into a json file.
Wish it helps.

Resources