Model Monitoring for Image Data not working in Vertex AI - google-cloud-vertex-ai

My use case is related to multiclass image classification. Deployed CNN Model in production and enabled Model Monitoring for prediction drift detection only which does not require training data. It automatically gets created two buckets- analysis and predict in storage bucket. Then I created and run 1000 instances for model testing purpose(Same request 1000 times through Apache Bench) as it was prerequisite. I kept monitoring job to run for every hour and 100% sampling rate. I am not getting any output or logs in newly created buckets?
What's the error here?
Is Model Monitoring(Prediction Drift Detection) not enabled for Image Data by Vertex AI?
What steps do I need to take in order to check the Model Monitoring is working fine for Image Classification Model. We need evidence in the form of logs generated in two buckets.

Model monitoring is only supported for tabular AutoML and tabular custom-trained models at the moment. It is not support for custom-trained image classification models.
For a more proactive approach that should minimize prediction drift in image classification models, Vertex AI Team would recommend the following:
• Augmenting your data such that you have a more diverse set of samples. This set should match your business needs, and has meaningful transformations given your context. Please refer to [2] for more information about data augmentation.
• Utilizing Vertex Explainable AI to identify the features which are contributing the most to your model's classification decisions. This would help you to augment your data in a more educated manner. Please refer to [3] for more information about Vertex Explainable AI.
[1] https://cloud.google.com/vertex-ai/docs/model-monitoring/overview
[2] https://www.tensorflow.org/tutorials/images/data_augmentation
[3] https://cloud.google.com/vertex-ai/docs/explainable-ai/overview

Related

How to validate my YOLO model trained on custom data set?

I am doing my research regarding object detection using YOLO although I am from civil engineering field and not familiar with computer science. My advisor is asking me to validate my YOLO detection model trained on custom dataset. But my problem is I really don't know how to validate my model. So, please kindly point me out how to validate my model.
Thanks in advance.
I think first you need to make sure that all the cases you are interested in (location of objects, their size, general view of the scene, etc) are represented in your custom dataset - in other words, the collected data reflects your task. You can discuss it with your advisor. Main rule - you label data qualitatively in same manner as you want to see it on the output. more information can be found here
It's really important - garbage in, garbage out, the quality of output of your trained model is determined by the quality of the input (labelled data)
If this is done, it is common practice to split your data into training and test sets. During model training only train set is used, and you can later validate the quality (generalizing ability, robustness, etc) on data that the model did not see - on the test set. It's also important, that this two subsets don't overlap - than your model will be overfitted and the model will not perform the tasks properly.
Than you can train few different models (with some architectural changes for example) on the same train set and validate them on the same test set, and this is a regular validation process.

Can anyone explain eager_few_shot_od_training_tflite.ipynb code

I am trying to understand example code provided by tensorflow team in github link eager_few_shot_od_training_tflite.ipynb. I am able to understand most of the code except below few lines of code.
Not sure why we are creating fake_box_predictor variable and creating fake_model?
Why can't we restore checkpoint on detection_model directly instead of creating fake_model?
Also I do not understand comments above code "We will restore the box regression head but initialize the classification head from scratch".
Can anyone explain in detail what below code is doing (and comments provided above this code mean)?
# Set up object-based checkpoint restore --- SSD has two prediction
# `heads` --- one for classification, the other for box regression. We will
# restore the box regression head but initialize the classification head
# from scratch (we show the omission below by commenting out the line that
# we would add if we wanted to restore both heads)
fake_box_predictor = tf.compat.v2.train.Checkpoint(
_base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,
# _prediction_heads=detection_model._box_predictor._prediction_heads,
# (i.e., the classification head that we *will not* restore)
_box_prediction_head=detection_model._box_predictor._box_prediction_head,
)
fake_model = tf.compat.v2.train.Checkpoint(
_feature_extractor=detection_model._feature_extractor,
_box_predictor=fake_box_predictor)
ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)
ckpt.restore(checkpoint_path).expect_partial()
The details and documentations are extremely sparse on this topic. My understanding on this is,
2. Here weights of pretrained checkpoint are loaded for some of the layers for transfer learning and rest are initialized from scratch to learn the new class. The detection model is built from given config. If the checkpoint was restored on detection_model then it will detect the classes from COCO dataset since TF2 detections model zoo pretrained models are trained on COCO dataset.
3. The goal is to classify a new class of images. This is associated with class prediction layers. So these layers should be initialized from scratch and the rest feature extraction layers, bounding box regression layers are restored from checkpoint so model can take advantage of these pretrained weights. This will help the model converge faster as well as detect the new desired classes.
1. This is the process for loading partial model. The base_tower_layers_for_heads contains the earlier layers before the class prediction and box regression heads.
_box_prediction_head predicts the bounding boxes. If _prediction_heads=detection_model._box_predictor._prediction_heads was used the it should restore both classification and regression heads as it contains both box_prediction_heads and class_prediction_heads.
detection_model._feature_extractor is likely the initial layers containing classification network such as Resnet, Mobilenet + FPN for feature extraction.
fake_model connects bbox regression head + its earlier layers with the base feature extraction layers such as mobilenet in a computation graph. vars(fake_model) will contain _box_predictor and _feature_extractor.
The desired weights are then restored from pretrained checkpoint and expect_partial() is used to silence warning about unused part of checkpoint.
Tensorflow advanced computer vision course on coursera has more details on this as well as overall tensorflow object detection structure and code. Corrections for any mistakes is welcome.

Discovery model offline during incremental training

A question now that we are using Discovery. We were thinking we would do incremental training of Discovery while it is in production as we gather bits of training data from the faculty (SMEs) in CogUniversity. However, it seems that while Discovery is training, it does not return a confidence score. Is there a way around that? To me the big benefit of incremental training is that we can improve the machine learning model while it's being used in production. Seems like incremental training doesn't help if the systems has to be taken out of production while training. Please advise.
Training a new model doesn't take the old one offline, but deleting all of the training data for a collection will. If your incremental training process involves deleting all of the training data and uploading different data, then that could be why you're not seeing confidence scores while the new model trains.

h2o.ai Platt Scaling calibration

I noticed a relatively recend add to the h2o.ai suite, the ability to perform supplementary Platt Scaling to improve the calibration of output probabilities. (See calibrate_model in h2o manual.) Nevertheless few guidance is avaiable on the online help docs. In particular I wonder whether when Platt Scaling is enabled:
How it affects the models' leaderboard? That is, is the platt scaling calculated after the ranking metric or before?
How it affects computing performance?
Can the calibration_frame be the same as validation_frame or should not (both under a computation or theoretical point of view)?
Thanks in advance
Calibration is a post-processing step run after the model finishes. Therefore it doesn't affect the leaderboard and and it has no effect on the training metrics either. It adds 2 more columns to the scored frame (with calibrated predictions).
This article provides guidance how to construct a calibration frame:
Split dataset into test and train
Split the train set into model training and calibration.
It also says:
The most important step is to create a separate dataset to perform calibration with.
I think the calibration frame should be used only for calibration, and hence distinct from the validation frame. The conservative answer is that they should be separate -- when you use a validation frame for early stopping or any internal model tuning (e.g. lambda search in H2O GLM), that validation frame becomes an extension of the "training data" so it's kind of off-limits at that point. However you could try both versions and directly observe what the effect is, then make a decision. Here's some additional guidance from the article:
"How much data to use for calibration will depend on the amount of data you have available. The calibration model will generally only be fitting a small number of parameters (so you do not need a huge volume of data). I would aim for around 10% of your training data, but at a minimum of at least 50 examples."

H2O AutoML building a large number of GBM models

I tried to use AutoML for a binary classification task with 100 hours. It appears that it is just building a large number of GBM models and not getting to other types. (So far built 40)
Is there a way to set the maximum number of GBM models?
There is an order in which AutoML builds the models (the GBMs are first in line). The length of the GBM model building process will depend on how much time you set for max_runtime_secs. If you plan to run it for 100 hours, then a good portion of that will be spend in the GBM hyperparamter space, so I am not surprised that your first 40 models are GBMs. In other words, this is expected behavior.
If you want variety in your models as they are training, then you can run a single AutoML job for a smaller max_runtime_secs (say 2 hours), and then run the AutoML process again on that same project (49 more times at 2 hours each -- or some combination that adds up to 100 hours). If you use the same project_name when you start an AutoML job, a full new set of models (GBMs, RFs, DNNs, GLMs) should be added to the existing AutoML leaderboard.
As Erin said, if you run AutoML multiple times with the same project_name the results will accumulate into a single leaderboard and the hyperparameter searches will accumulate into the same grid objects. However, AutoML will still run through the same sequence of model builds, so it will do a GBM hyperparameter search again before it gets to the DL model builds.
It feels like your GBM hyperparameter search isn't converging because the stopping_tolerance is too small for your dataset. There was a bug in pre-release versions of the bindings which forced the stopping_tolerance to 0.001 instead of letting AutoML set it higher, if it calculated that that tolerance was too tight for a small dataset. Which version of H2O-3 are you using?
A bit about stopping criteria:
The stopping_criteria such as max_models, stopping_rounds, and stopping_tolerance apply to the overall AutoML process as well as to the hyperparameter searches and the individual model builds. At the beginning of the run max_runtime_secs is used to compute the end time for the entire process, and then at each stage the remaining overall time is computed and is passed down to the model build or hyperparameter search subtask.
The Run Time 558:10:56.131 that you posted is really weird. I don't see that sort of output in the AutoML.java code nor in the Python or R bindings. It looks at first glance like this is coming from outside of H2O. . . Do you have any sense of what the real time was for this run?
We should be able to figure out what's going on if you do the following:
If you're not on the recent release version 3.14.x, please upgrade.
While we're debugging please set the seed parameter for your AutoML run so that we get repeatable results.
Please post your stopping criteria, your leaderboard output, your User Feedback output, and send your H2O logs to rpeck (at) h2o.ai and support (at) h2o.ai in case we need to delve further. You can grab the H2O logs from the server or download them using Flow.

Resources