Using tf.metrics.mean_iou during training - metrics

I want to train a model using the tensorflow estimator and want to track multiple metrics during training end evaluation. The metrics i want to track are accruacy and mean intersection-over-union (and my loss).
I managed to figure out how to track the accuracy during training:
if mode == tf.estimator.ModeKeys.TRAIN:
...
accuracy = tf.metrics.accuracy(labels=indices_ground_truth, predictions=indices_prediction, name='acc_op')
tf.summary.scalar('accuracy', accuracy[1])
and evaluation:
if mode == tf.estimator.ModeKeys.EVAL:
...
accuracy = tf.metrics.accuracy(labels=indices_ground_truth, predictions=indices_prediction)
eval_metric_ops = {'accuracy': accuracy}
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=eval_metric_ops)
For evaluation the mean intersection over union works the same. So its actually:
if mode == tf.estimator.ModeKeys.EVAL:
...
miou = tf.metrics.mean_iou(labels=indices_ground_truth, predictions=indices_prediction, num_classes=13)
accuracy = tf.metrics.accuracy(labels=indices_ground_truth, predictions=indices_prediction)
eval_metric_ops = {'miou': miou,
'accuracy': accuracy}
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=eval_metric_ops)
As far as i know i have to track the update operation (the second return value) on the value during training. Otherwise it returns 0 every time. For a single value like the accuracy that works.
But for the miou the second return value is the update operation of the confusion matrix used to calculate the miou. Thats a [numClass,numClass] tensor. If i try to track it like the accuracy tf.summary.scalar('miou', miou[1]) it crashes because a [numClass,numClass] tensor is not a scalar.
tf.summary.scalar('miou', miou[0]) gives me 0s everytime.
So how can i give the miou to the summary?

Here is how I calculate the IoU while training:
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(predict, raw_gt, num_classes=2, weights=None)
tf.summary.scalar('meanIoU', mIoU)
confusion_matrix, _ = sess.run([update_op, train_op], feed_dict=feed_dict)
iou = sess.run(mIoU)
print('iou score = {:.3f}, ({:.3f} sec/step)'.format(iou, duration))
You don't need to track the confusion matrix output to track the IoU on tensorboard. The above works fine for me. I think, what you are missing is running the tensors in your session. You need to run update_op such as sess.run(update_op), while running metric operations as sess.run(iou)

Related

How do I add noise/variability to a dataset in Python, given the CV?

Given a dataset of blood results, say cholesterol level, and knowing that the instrument that produced those results is subject to a known degree of variability, how would I add that variability back into the dataset? i.e. I want to assume the result in the original dataset is the true/mean value, and then produce new results that are subject to the known variability of the instrument.
In Excel you use =NORM.INV(RAND(), mean, std_dev), where RAND() provides a random value between 0 and 1, "mean" will be the original value and I have the CV so I can calculate the SD. NORM.INV then provides the inverse of the cumulative normal distribution function.
I've done the following to create a new column with my new values, but would like to know if it is valid (i.e., will each row have a different random number between 0 and 1 as the probability? and is this formula equivalent to NORM.INV?
df8000['HDL_1'] = norm.ppf(random(), loc = df8000['HDL_0'], scale = TAE_df.loc[0,'HDL'])
Thanks in advance!

Tuning max_depth in Random Forest using CARET

I'm building a Random Forest with Caret package on R with method = "rf". I see that every type of random forest on caret seems only tune mtry which is the number of features selected randomly for each tree. I do not understand why max_depth of each tree is not a tunable parameter (like cart) ? In my mind, it is a parameter which can limit over-fitting.
For example, my rf seems really better on train data than the test data :
model <- train(
group ~., data = train.data, method = "rf",
trControl = trainControl("repeatedcv", number = 5,repeats =10),
tuneLength=5
)
> postResample(fitted(model),train.data$group)
Accuracy Kappa
0.9574592 0.9745841
> postResample(predict(model,test.data),test.data$group)
Accuracy Kappa
0.7333333 0.5428571
As you can see my model is clearly over-fitted. However, I tried a lot of different things to handle this but nothing worked. I always have something like 0.7 accuracy on test data and 0.95 on train data. This is why I want to optimize other parameters.
I cannot share my data to reproduce this.

Why does a Gensim Doc2vec object return empty doctags?

My question is how I should interpret my situation?
I trained a Doc2Vec model following this tutorial https://blog.griddynamics.com/customer2vec-representation-learning-and-automl-for-customer-analytics-and-personalization/.
For some reason, doc_model.docvecs.doctags returns {}. But doc_model.docvecs.vectors_docs seems to return a proper value.
Why the doc2vec object doesn't return any doctags but vectors_docs?
Thank you for any comments and answers in advance.
This is the code I used to train a Doc2Vec model.
from gensim.models.doc2vec import LabeledSentence, TaggedDocument, Doc2Vec
import timeit
import gensim
embeddings_dim = 200 # dimensionality of user representation
filename = f'models/customer2vec.{embeddings_dim}d.model'
if TRAIN_USER_MODEL:
class TaggedDocumentIterator(object):
def __init__(self, df):
self.df = df
def __iter__(self):
for row in self.df.itertuples():
yield TaggedDocument(words=dict(row._asdict())['all_orders'].split(),tags=[dict(row._asdict())['user_id']])
it = TaggedDocumentIterator(combined_orders_by_user_id)
doc_model = gensim.models.Doc2Vec(vector_size=embeddings_dim,
window=5,
min_count=10,
workers=mp.cpu_count()-1,
alpha=0.055,
min_alpha=0.055,
epochs=20) # use fixed learning rate
train_corpus = list(it)
doc_model.build_vocab(train_corpus)
for epoch in tqdm(range(10)):
doc_model.alpha -= 0.005 # decrease the learning rate
doc_model.min_alpha = doc_model.alpha # fix the learning rate, no decay
doc_model.train(train_corpus, total_examples=doc_model.corpus_count, epochs=doc_model.iter)
print('Iteration:', epoch)
doc_model.save(filename)
print(f'Model saved to [{filename}]')
else:
doc_model = Doc2Vec.load(filename)
print(f'Model loaded from [{filename}]')
doc_model.docvecs.vectors_docs returns
If all of the tags you supply are plain Python ints, those ints are used as the direct-indexes into the vectors-array.
This saves the overhead of maintaining a mapping from arbitrary tags to indexes.
But, it may also cause an over-allocation of the vectors array, to be large enough for the largest int tag you provided, even if other lower ints are never used. (That is: if you provided a single document, with a tags=[1000000], it will allocate an array sufficient for tags 0 to 1000000, even if most of those never appear in your training data.)
If you want model.docvecs.doctags to collect a list of all your tags, use string tags rather than plain ints.
Separately: don't call train() multiple times in your own loop, or manage the alpha learning-rate in your own code, unless you have an overwhelmingly good reason to do so. It's inefficient & error-prone. (Your code, for example, is actually performing 200 training-epochs, and if you were to increase the loop count without carefully adjusting your alpha increment, you could wind up with nonsensical negative alpha values – a very common error in code following this bad practice. Call .train() once with your desired number of epochs. Set the alpha and min_alpha at reasonable starting and nearly-zero values – probably just the defaults unless you're sure your change is helping – and then leave them alone.

EasyPredictModelWrapper giving wrong prediction

public BinomialModelPrediction predictBinomial(RowData data) throws PredictException {
double[] preds = this.preamble(ModelCategory.Binomial, data);
BinomialModelPrediction p = new BinomialModelPrediction();
double d = preds[0];
p.labelIndex = (int)d;
String[] domainValues = this.m.getDomainValues(this.m.getResponseIdx());
p.label = domainValues[p.labelIndex];
p.classProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.classProbabilities, 0, p.classProbabilities.length);
if(this.m.calibrateClassProbabilities(preds)) {
p.calibratedClassProbabilities = new double[this.m.getNumResponseClasses()];
System.arraycopy(preds, 1, p.calibratedClassProbabilities, 0, p.calibratedClassProbabilities.length);
}
return p;
}
Eg: classProbabilities =[0.82333,0,276666]
labelIndex = 1
label = true
domainValues = [false,true]
what does this labelIndex signifies and does the class probabilities
order is same as the domain value order ,If order is same then it means that here probability of false is 0.82333 and probability of true is 0.27666 but why is this labelIndex showing as 1 and label as true.
Please help me to figure out this issue.
Like Tom commented, the prediction is not "wrong". You can infer from this that the threshold H2O has chosen is less than 0.27666. You probably have imbalanced training data, otherwise H2O would have not picked a low threshold for classifying a predicted value of 0.27666 as a 1. Does your training set include fewer examples of the positive class than the negative class?
If you don't like that threshold for whatever reason, then you can manually create your own. Just make sure you know how to properly evaluate the effect of using different thresholds on the performance of your model, otherwise I'd recommend just using the default threshold.
The name, "classProbabilities" is a misnomer. These are not actual probabilities, they are predicted values, though people often use the terms interchangeably. Binary classification algorithms produce "predicted values" that look like probabilities when they're between 0 and 1, but unless a calibration process is performed, they are not going to represent the probabilities. Calibration is not necessarily a straight-forward process and there are many techniques. Here's some more info about calibration methods for imbalanced data. In H2O, you can perform calibration using Platt scaling using the calibrate_model option. But this is probably not really necessary to what you're trying to do.
The proper way to use the raw output from a binary classification model is to only look at the predicted value for the positive class (you can simply ignore the predicted value for the negative class). Then you choose a threshold which suits your needs, or you can use the default threshold in H2O, which is chosen to maximize the F1 score. Some other software will use a hardcoded threshold of 0.5, but that will be a terrible choice if you don't have an even number of positive and negative examples in your training data. If you have only a few positive examples in your training data, then the best threshold will be something much lower than 0.5.

Image Classification. Validation loss stuck during training with inception (v1)

I have built a small custom image classification training/val dataset with 4 classes.
The training dataset has ~ 110.000 images.
The validation dataset has ~ 6.000 images.
The problem I'm experiencing is that, during training, both training accuracy (measured as an average accuracy on the last training samples) and training loss improve, while validation accuracy and loss stay the same.
This only occurs when I use inception and resnet models, if I use an alexnet model on the same training and validation data, the validation loss and accuracy improve
In my experiments I am employing several convolutional architectures by importing them from tensorflow.contrib.slim.nets
The code is organized as follows:
...
images, labels = preprocessing(..., train=True)
val_images, val_labels = preprocessing(..., train=False)
...
# AlexNet model
with slim.arg_scope(alexnet.alexnet_v2_arg_scope()):
logits, _ = alexnet.alexnet_v2(images, ..., is_training=True)
tf.get_variable_scope().reuse_variables()
val_logits, _ = alexnet.alexnet_v2(val_images, ..., is_training=False)
# Inception v1 model
with slim.arg_scope(inception_v1_arg_scope()):
logits, _ = inception_v1(images, ..., is_training=True)
val_logits, _ = inception_v1(val_images, ..., is_training=False, reuse=True)
loss = my_stuff.loss(logits, labels)
val_loss = my_stuff.loss(val_logits, val_labels)
training_accuracy_op = tf.nn.in_top_k(logits, labels, 1)
top_1_op = tf.nn.in_top_k(val_logits, val_labels, 1)
train_op = ...
...
Instead of using a separate eval script, I'm running the validation step at the end of each epoch and also, for debugging purposes, I'm running an early val step (before training) and I'm checking the training accuracies by averaging training predictions on the last x steps.
When I use the Inception v1 model (commenting out the alexnet one) the logger output is as follows after 1 epoch:
early Validation Step
precision # 1 = 0.2440 val loss = 1.39
Starting epoch 0
step 50, loss = 1.38, training_acc = 0.3250
...
step 1000, loss = 0.58, training_acc = 0.6725
...
step 3550, loss = 0.45, training_acc = 0.8063
Validation Step
precision # 1 = 0.2473 val loss = 1.39
As shown, training accuracy and loss improve a lot after one epoch, but the validation loss doesn't change at all. This has been tested at least 10 times, the result is always the same. I would understand if the validation loss was getting worse due to overfitting, but in this case it's not changing at all.
To rule out any problems with the validation data, I'm also presenting the results while training using the AlexNet implementation in slim. Training with the alexnet model produces the following output:
early Validation Step
precision # 1 = 0.2448 val loss = 1.39
Starting epoch 0
step 50, loss = 1.39, training_acc = 0.2587
...
step 350, loss = 1.38, training_acc = 0.2919
...
step 850, loss = 1.28, training_acc = 0.3898
Validation Step
precision # 1 = 0.4069 val loss = 1.25
Accuracy and validation loss, both in training and test data, correctly improve when using the alexnet model, and they keep improving in subsequent epochs.
I don't understand what may be the cause of the problem, and why it presents itself when using inception/resnet models, but not when training with alexnet.
Does anyone have ideas?
After searching through forums, reading various threads and experimenting I found the root of the problem.
Using a train_op which was basically recycled from another example was the problem, it worked well with the alexnet model, but didn't work on other models since it was lacking batch normalization updates.
To fix this i had to use either
optimizer = tf.train.GradientDescentOptimizer(0.005)
train_op = slim.learning.create_train_op(total_loss, optimizer)
or
train_op = tf.contrib.layers.optimize_loss(total_loss, global_step, .005, 'SGD')
This seems to take care of the batchnorm updates being done.
The problem still persisted for short training runs because of the slow moving averages updates.
The default slim arg_scope had the decay set to 0.9997, which is stable but apparently needs many steps to converge. Using the same arg_scope but with decay set to 0.99 or 0.9 did help in this short training scenario.
It seems you are using logits to calculate the validation loss; use predictions, it may help.
val_logits, _ = inception_v1(val_images, ..., is_training=False, reuse=True)
val_logits = tf.nn.softmax(val_logits)

Resources