I'm working on image classification (multiclass). This is a custom dataset. When fitting the model, I get this warning, and validation loss and accuracy are not displayed. How do I fix this?
Epoch 1/20
61/365 [====>.........................] - ETA: 8:05 - loss: 6.7683 - accuracy: 0.2262/usr/local/lib/python3.8/dist-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg))
Please help me solving this.
Related
Actually this is a sequel to:
post
I am training a Word2Vec model using gensim, with parameters hs=1, sg=0 and negative=0. Less training time is required after the code is modified, but something seems to go wrong with the loss, it would at first increase then decrease, I don't know what happened.
The code is as follows:
from gensim.models.keyedvectors import KeyedVectors
from gensim.models import word2vec
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',
level=logging.INFO)
sentences = word2vec.Text8Corpus("text8") # loading the corpus
from gensim.models.callbacks import CallbackAny2Vec
loss_list = []
class Callback(CallbackAny2Vec):
def __init__(self):
self.epoch = 0
def on_epoch_end(self, model):
loss = model.get_latest_training_loss()
loss_list.append(loss)
print('Loss after epoch {}:{}'.format(self.epoch, loss))
model.running_training_loss = 0.0
self.epoch = self.epoch + 1
from gensim.models import KeyedVectors,word2vec,Word2Vec
import time
start_time = time.time()
model = word2vec.Word2Vec(sentences, hs=1, sg=0, negative=0, compute_loss=True, epochs=30, callbacks=[Callback()])
end_time = time.time()
print('Running time: %s seconds' % (end_time - start_time))
The codes are actually written in jupyter, as can be seen in the screenshot:
and the output is like this:
More details about the output:
Loss after epoch 0:39370848.0
Loss after epoch 1:43579636.0
Loss after epoch 2:45213772.0
Loss after epoch 3:46132356.0
Loss after epoch 4:46788412.0
Loss after epoch 5:47218508.0
Loss after epoch 6:47553520.0
Loss after epoch 7:47793332.0
Loss after epoch 8:47995616.0
Loss after epoch 9:48134664.0
Loss after epoch 10:48224960.0
Loss after epoch 11:48326640.0
Loss after epoch 12:48371072.0
Loss after epoch 13:48405980.0
Loss after epoch 14:48437804.0
Loss after epoch 15:48417612.0
Loss after epoch 16:48415112.0
Loss after epoch 17:48396260.0
Loss after epoch 18:48349064.0
Loss after epoch 19:48301088.0
Loss after epoch 20:48247328.0
Loss after epoch 21:48167340.0
Loss after epoch 22:48053500.0
Loss after epoch 23:47937300.0
Loss after epoch 24:47810964.0
Loss after epoch 25:47669088.0
Loss after epoch 26:47500524.0
Loss after epoch 27:47300488.0
Loss after epoch 28:47044920.0
Loss after epoch 29:46747080.0
Running time: 259.9046218395233 seconds
I wouldn't expect that pattern of rising-then-falling loss; I would think that usual SGD optimization would usually ahieve falling full-epoch loss from the very beginning.
However, if the end result vectors are still performing well, I wouldn't worry too much about surprises in secondary progress indicators, like that loss tally, for a number of reasons:
as noted in my previous answer (& further discussed in the Gensim project open issue #2617), Gensim's external loss-reporting has a number of known bugs & inconsistencies. Any oddness in observed loss-reporting may be a side-effect of those issues, without necessarily indicating any problem with the actual training updates.
it appears you're finishing 30 training epochs in 260 seconds – each full training pass in under 9 seconds. That suggests your training data is pretty tiny – perhaps too small to be a good example of word2vec capabilities, or too small for the default 100-dimensional vectors. That smallness, or other peculiarities in the training data, might contribute to atypical loss trends, or exercising some of the other weaknesses of the current loss-tallying code. If the same hard-to-explain pattern occurs with a more typical training corpus – such as one 100x larger – then it'd be more interesting to do a deep-dive investigation to understand what's happening. But unexpected results on tiny/unusual/atypical training runs might just be because such runs are far from where the usual intuitions apply, and getting to the bottom of their causes less productive than acquiring enough data to run the algorithm in a more typical/reliable way.
I have seen most tutorials/guides having the validation step outside the epoch loop. A guide I follow though has the validation step inside the epoch loop. Which one is right?
I notice that if you have the validation inside the epoch loop you can plot the validation per epoch loss, but you can't have a proper confusion matrix (due to validating the same image dataset all over again) and vice versa. Or I haven't found a proper way yet. Any suggestions?
Thanks
The general way to write training, val loop in PyTorch is:
for epoch in range(num_epochs):
for phase, batch in [('train', train_dataloader), ('val', val_dataloader)]:
if phase == 'train':
model.train()
train(model, batch)
else:
model.eval()
val(model, batch)
# calculate the remaining statistics appropriately
A point to note here is that train_dataloader and val_dataloader are created from the train_dataset depending on your crossvalidation strategy (random split, stratified split etc)
I have just trained a new model with a binary outcome (elite/non-elite). The model trained well, but when I tested a new image on it in the GUI it returned a third label --other--. I am not sure how/why that has appeared. Any ideas?
When multi-class (single-label) classification is used, there is an assumption that the confidence of all predictions must sum to 1 (as one and exactly one valid label is assumed). This is achieved by using softmax function. It normalizes all predictions to sum to 1 - which has some drawbacks - for example if both predictions are very low - for example prediction of "elite" is 0.0001 and Non_elite is 0.0002 - after normalization the predictions would be 0.333 and 0.666 respectively.
To work around that the automl system allows to use extra label (--other--) to indicate that none of the allowed predictions seems valid. This label is implementation detail and shouldn't be returned by the system (should be filtered out). This should get fixed in the near future.
During training of my CNN with Keras, after each epoch I obtain the validation accuracy (val_acc). For instance I obtain val_acc: 0.9910 which means that the current trained model can predict, as expected, 991 out of 1000 samples of my validation dataset. Correct?
Then, how can I know (via callback or maybe enabling somehow the level of verbosity) which are the 9 samples of my validation dataset that resulted with incorrect prediction?
I have a binary classification problem for financial ratios and variables. When I use newff (with trainlm and mse and threshold of 0.5 for output) I have a high classification accuracy (5-fold cross validation – near 89-92%) but when I use patternnet (trainscg with crossentropy) my accuracy is 10% lower than newff. (I normalized data before insert it to network - mapminmax or mapstd)
When I use these models for out-sample data (for current year- created models designed based one previous year(s) data sets) I have better classification accuracies in patternnet with better sensitivity and specificity. For example I have these results in my problem:
Newff:
Accuracy: 92.8% sensitivity: 94.08% specificity: 91.62%
Out sample results: accuracy: 60% sensitivity: 48% and specificity: 65.57%
Patternnet:
Accuracy: 73.31% sensitivity: 69.85% specificity: 76.77%
Out sample results: accuracy: 70% sensitivity: 62.79% and specificity: 73.77%
Why we have these differences between newff and patternent. Which model should I use?
Thanks.