Image Classification. Validation loss stuck during training with inception (v1) - validation

I have built a small custom image classification training/val dataset with 4 classes.
The training dataset has ~ 110.000 images.
The validation dataset has ~ 6.000 images.
The problem I'm experiencing is that, during training, both training accuracy (measured as an average accuracy on the last training samples) and training loss improve, while validation accuracy and loss stay the same.
This only occurs when I use inception and resnet models, if I use an alexnet model on the same training and validation data, the validation loss and accuracy improve
In my experiments I am employing several convolutional architectures by importing them from tensorflow.contrib.slim.nets
The code is organized as follows:
...
images, labels = preprocessing(..., train=True)
val_images, val_labels = preprocessing(..., train=False)
...
# AlexNet model
with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
logits, _ = alexnet.alexnet_v2(images, ..., is_training=True)
tf.get_variable_scope().reuse_variables()
val_logits, _ = alexnet.alexnet_v2(val_images, ..., is_training=False)
# Inception v1 model
with slim.arg_scope(inception_v1_arg_scope()):
logits, _ = inception_v1(images, ..., is_training=True)
val_logits, _ = inception_v1(val_images, ..., is_training=False, reuse=True)
loss = my_stuff.loss(logits, labels)
val_loss = my_stuff.loss(val_logits, val_labels)
training_accuracy_op = tf.nn.in_top_k(logits, labels, 1)
top_1_op = tf.nn.in_top_k(val_logits, val_labels, 1)
train_op = ...
...
Instead of using a separate eval script, I'm running the validation step at the end of each epoch and also, for debugging purposes, I'm running an early val step (before training) and I'm checking the training accuracies by averaging training predictions on the last x steps.
When I use the Inception v1 model (commenting out the alexnet one) the logger output is as follows after 1 epoch:
early Validation Step
precision # 1 = 0.2440 val loss = 1.39
Starting epoch 0
step 50, loss = 1.38, training_acc = 0.3250
...
step 1000, loss = 0.58, training_acc = 0.6725
...
step 3550, loss = 0.45, training_acc = 0.8063
Validation Step
precision # 1 = 0.2473 val loss = 1.39
As shown, training accuracy and loss improve a lot after one epoch, but the validation loss doesn't change at all. This has been tested at least 10 times, the result is always the same. I would understand if the validation loss was getting worse due to overfitting, but in this case it's not changing at all.
To rule out any problems with the validation data, I'm also presenting the results while training using the AlexNet implementation in slim. Training with the alexnet model produces the following output:
early Validation Step
precision # 1 = 0.2448 val loss = 1.39
Starting epoch 0
step 50, loss = 1.39, training_acc = 0.2587
...
step 350, loss = 1.38, training_acc = 0.2919
...
step 850, loss = 1.28, training_acc = 0.3898
Validation Step
precision # 1 = 0.4069 val loss = 1.25
Accuracy and validation loss, both in training and test data, correctly improve when using the alexnet model, and they keep improving in subsequent epochs.
I don't understand what may be the cause of the problem, and why it presents itself when using inception/resnet models, but not when training with alexnet.
Does anyone have ideas?

After searching through forums, reading various threads and experimenting I found the root of the problem.
Using a train_op which was basically recycled from another example was the problem, it worked well with the alexnet model, but didn't work on other models since it was lacking batch normalization updates.
To fix this i had to use either
optimizer = tf.train.GradientDescentOptimizer(0.005)
train_op = slim.learning.create_train_op(total_loss, optimizer)
or
train_op = tf.contrib.layers.optimize_loss(total_loss, global_step, .005, 'SGD')
This seems to take care of the batchnorm updates being done.
The problem still persisted for short training runs because of the slow moving averages updates.
The default slim arg_scope had the decay set to 0.9997, which is stable but apparently needs many steps to converge. Using the same arg_scope but with decay set to 0.99 or 0.9 did help in this short training scenario.

It seems you are using logits to calculate the validation loss; use predictions, it may help.
val_logits, _ = inception_v1(val_images, ..., is_training=False, reuse=True)
val_logits = tf.nn.softmax(val_logits)

Related

Tuning max_depth in Random Forest using CARET

I'm building a Random Forest with Caret package on R with method = "rf". I see that every type of random forest on caret seems only tune mtry which is the number of features selected randomly for each tree. I do not understand why max_depth of each tree is not a tunable parameter (like cart) ? In my mind, it is a parameter which can limit over-fitting.
For example, my rf seems really better on train data than the test data :
model <- train(
group ~., data = train.data, method = "rf",
trControl = trainControl("repeatedcv", number = 5,repeats =10),
tuneLength=5
)
> postResample(fitted(model),train.data$group)
Accuracy Kappa
0.9574592 0.9745841
> postResample(predict(model,test.data),test.data$group)
Accuracy Kappa
0.7333333 0.5428571
As you can see my model is clearly over-fitted. However, I tried a lot of different things to handle this but nothing worked. I always have something like 0.7 accuracy on test data and 0.95 on train data. This is why I want to optimize other parameters.
I cannot share my data to reproduce this.

Even an image in data set used to train is giving opposite values when making prediction

I am new to ML and TensorFlow. I am trying to build a CNN to categorize a good image against corrupted images, similar to rock paper scissor tutorials in tensor flow, except for only two categories.
The Colab Notebook
Model Architecture
train_generator = training_datagen.flow_from_directory(
TRAINING_DIR,
target_size=(150,150),
class_mode='categorical'
)
validation_generator = validation_datagen.flow_from_directory(
VALIDATION_DIR,
target_size=(150,150),
class_mode='categorical'
)
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 150x150 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu', input_shape=(150, 150, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
tf.keras.layers.Dropout(0.5),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(2, activation='softmax')
])
model.summary()
model.compile(loss = 'categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
history = model.fit_generator(train_generator, epochs=25, validation_data = validation_generator, verbose = 1)
model.save("rps.h5")
Only Change I made was turning input shape to (150,150,1) to (150,150,3) and changed last layers output to 2 neurons from 3. The training gave me consistently accuracy of 90 above for data set of 600 images in each class. But when I am making a prediction using code in the tutorial, it gives me highly wrong values even for data in the data set.
PREDICTION
Original code in TensorFlow tutorial
for file in onlyfiles:
path = fn
img = image.load_img(path, target_size=(150, 150,3)) # changed target_size to (150, 150,3)) from (150,150 )
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(fn)
print(classes)
I changed target_size to (150, 150,3)) from (150,150) in my belief that since my input is a 3 channel image,
Result
It gives very wrong values [0,1][0,1] for even images in which are in dataset
But when I changed the code to this
for file in onlyfiles:
path = fn
img = image.load_img(path, target_size=(150, 150,3))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x /= 255.
classes = model.predict(images, batch_size=10)
print(fn)
print(classes)
In this case values come like
[[9.9999774e-01 2.2242968e-06]]
[[9.9999785e-01 2.1864464e-06]]
[[9.9999785e-01 2.1641024e-06]]
one or two errors are there but it is very much correct
So my question even though the last activation is softmax, why it is now coming in decimal values, is there any logical mistake in the way I am making predictions.? I tried binary also, but couldn't find much difference.
Please note -
When you are changing output classes from 2 to 3, you are asking the model to categorise into 3 classes. This would contradict your problem statement which separates good and corrupted ones i.e 2 output classes (a binary problem). I think it can be reversed from 3 to 2 if I have understood the question correctly.
Second the output you are getting is perfectly correct, the neural network models outputs probabilities instead of absolute class values like 0 or 1. By probability, it tells how likely it belongs to say class 0 or class 1.
Also , as mentioned above by #BBloggsbott - you just have to use np.argmax on the output array which will tell you the probability of belonging to class 1 (Positive class) by default.
Hope this helps.
Thanks.
Softmax returns probability distributions for the vector it gets as input. So, the fact that you are getting decimal values is not a problem. If you want to find the exact class each image belongs to, try using the argmax function on the predictions.

Copying embeddings for gensim word2vec

I wanted to see if I can simply set new weights for gensim's Word2Vec without training. I get the 20 News Group data set from scikit-learn (from sklearn.datasets import fetch_20newsgroups) and trained an instance of Word2Vec on it:
model_w2v = models.Word2Vec(sg = 1, size=300)
model_w2v.build_vocab(all_tokens)
model_w2v.train(all_tokens, total_examples=model_w2v.corpus_count, epochs = 30)
Here all_tokens is the tokenized data set.
Then I created a new instance of Word2Vec without training
model_w2v_new = models.Word2Vec(sg = 1, size=300)
model_w2v_new.build_vocab(all_tokens)
and set the embeddings of the new Word2Vec equal to the first one
model_w2v_new.wv.vectors = model_w2v.wv.vectors
Most of the functions work as expected, e.g.
model_w2v.wv.similarity( w1='religion', w2 = 'religions')
> 0.4796233
model_w2v_new.wv.similarity( w1='religion', w2 = 'religions')
> 0.4796233
and
model_w2v.wv.words_closer_than(w1='religion', w2 = 'judaism')
> ['religions']
model_w2v_new.wv.words_closer_than(w1='religion', w2 = 'judaism')
> ['religions']
and
entities_list = list(model_w2v.wv.vocab.keys()).remove('religion')
model_w2v.wv.most_similar_to_given(entity1='religion',entities_list = entities_list)
> 'religions'
model_w2v_new.wv.most_similar_to_given(entity1='religion',entities_list = entities_list)
> 'religions'
However, most_similar doesn't work:
model_w2v.wv.most_similar(positive=['religion'], topn=3)
[('religions', 0.4796232581138611),
('judaism', 0.4426296651363373),
('theists', 0.43141329288482666)]
model_w2v_new.wv.most_similar(positive=['religion'], topn=3)
>[('roderick', 0.22643062472343445),
> ('nci', 0.21744996309280396),
> ('soviet', 0.20012077689170837)]
What am I missing?
Disclaimer. I posted this question on datascience.stackexchange but got no response, hoping to have a better luck here.
Generally, your approach should work.
It's likely the specific problem you're encountering was caused by an extra probing step you took and is not shown in your code, because you had no reason to think it significant: some sort of most_similar()-like operation on model_w2v_new after its build_vocab() call but before the later, malfunctioning operations.
Traditionally, most_similar() calculations operate on a version of the vectors that has been normalized to unit-length. The 1st time these unit-normed vectors are needed, they're calculated – and then cached inside the model. So, if you then replace the raw vectors with other values, but don't discard those cached values, you'll see results like you're reporting – essentially random, reflecting the randomly-initialized-but-never-trained starting vector values.
If this is what happened, just discarding the cached values should cause the next most_similar() to refresh them properly, and then you should get the results you expect:
model_w2v_new.wv.vectors_norm = None

Using tf.metrics.mean_iou during training

I want to train a model using the tensorflow estimator and want to track multiple metrics during training end evaluation. The metrics i want to track are accruacy and mean intersection-over-union (and my loss).
I managed to figure out how to track the accuracy during training:
if mode == tf.estimator.ModeKeys.TRAIN:
...
accuracy = tf.metrics.accuracy(labels=indices_ground_truth, predictions=indices_prediction, name='acc_op')
tf.summary.scalar('accuracy', accuracy[1])
and evaluation:
if mode == tf.estimator.ModeKeys.EVAL:
...
accuracy = tf.metrics.accuracy(labels=indices_ground_truth, predictions=indices_prediction)
eval_metric_ops = {'accuracy': accuracy}
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=eval_metric_ops)
For evaluation the mean intersection over union works the same. So its actually:
if mode == tf.estimator.ModeKeys.EVAL:
...
miou = tf.metrics.mean_iou(labels=indices_ground_truth, predictions=indices_prediction, num_classes=13)
accuracy = tf.metrics.accuracy(labels=indices_ground_truth, predictions=indices_prediction)
eval_metric_ops = {'miou': miou,
'accuracy': accuracy}
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=eval_metric_ops)
As far as i know i have to track the update operation (the second return value) on the value during training. Otherwise it returns 0 every time. For a single value like the accuracy that works.
But for the miou the second return value is the update operation of the confusion matrix used to calculate the miou. Thats a [numClass,numClass] tensor. If i try to track it like the accuracy tf.summary.scalar('miou', miou[1]) it crashes because a [numClass,numClass] tensor is not a scalar.
tf.summary.scalar('miou', miou[0]) gives me 0s everytime.
So how can i give the miou to the summary?
Here is how I calculate the IoU while training:
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(predict, raw_gt, num_classes=2, weights=None)
tf.summary.scalar('meanIoU', mIoU)
confusion_matrix, _ = sess.run([update_op, train_op], feed_dict=feed_dict)
iou = sess.run(mIoU)
print('iou score = {:.3f}, ({:.3f} sec/step)'.format(iou, duration))
You don't need to track the confusion matrix output to track the IoU on tensorboard. The above works fine for me. I think, what you are missing is running the tensors in your session. You need to run update_op such as sess.run(update_op), while running metric operations as sess.run(iou)

Can I perform Keras training in a deterministic manner?

I'm using a Keras Sequential model where the inputs and labels are exactly the same each run. Keras is using a Tensorflow backend.
I've set the layer activations to 'zeros' and disabled batch shuffling during training.
model = Sequential()
model.add(Dense(128,
activation='relu',
kernel_initializer='zeros',
bias_initializer='zeros'))
...
model.compile(optimizer='rmsprop', loss='binary_crossentropy')
model.fit(x_train, y_train,
batch_size = 128, verbose = 1, epochs = 200,
validation_data=(x_validation, y_validation),
shuffle=False)
I've also tried seeding Numpy's random() method:
np.random.seed(7) # fix random seed for reproducibility
With the above in place I still receive different accuracy and loss values after training.
Am I missing something or is there no way to fully remove the variance between trainings?
Since this seems to be a real issue, as commented before, maybe you could go for manually initializing your weights (instead of trusting the 'zeros' parameter passed in the layer constructor):
#where you see layers[0], it's possible that the correct layer is layers[1] - I can't test at this moment.
weights = model.layers[0].get_weights()
ws = np.zeros(weights[0].shape)
bs = np.zeros(weights[1].shape)
model.layers[0].set_weights([ws,bs])
It seems the problem occurs in training and not initialization. You can check this by first initializing two models model1 and model2 and running the following code:
w1 = model1.get_weights()
w2 = model2.get_weights()
for i in range(len(w1)):
w1i = w1[i]
w2i = w2[i]
assert np.allclose(w1i, w2i), (w1i, w2i)
print("Weight %i were equal. "%i)
print("All initial weights were equal. ")
Even though all assertions passed, training model1 and model2 with shuffle=False yielded different models. That is, if I perform similar assertions on the weights of model1 and model2 after training the assertions all fail. This suggests that the problem lies in randomness from training.
As of this post I have not managed to figure out how to circumvent this.

Resources