How do I do model versioning with AutoMLImageTrainingJobRunOp? - google-cloud-vertex-ai

I know in Vertex AI you can version models. You can eg upload a model and set its parent_model:
model_v2 = aip.Model.upload(parent_model=model_v1.resource_name,...
And I know in the GUI you can create an AutoML model that is a version of an existing one, but how do you do it in code?
In a pipeline I use AutoMLImageTrainingJobRunOp but it does not have a parent_model parameter.

You can enable model versioning using the below code snippet:
from google.cloud import aiplatform
DISPLAY_NAME = "model_name"
models = aiplatform.Model.list(filter=("display_name={}").format(DISPLAY_NAME))
if len(models) == 0:
model_upload = aiplatform.Model.upload(
display_name = DISPLAY_NAME, # Your model display name
version_description="Add model description here", #Add model description
version_aliases=["v1"], # Create Model Alias
labels={"release": "dev"}, #Label your model
artifact_uri = model.uri[:-6],
...
)
else:
parent_model = models[0].resource_name
version_id = models[0].version_id
model_upload = aiplatform.Model.upload(
display_name = DISPLAY_NAME,
...,
parent_model = parent_model
)
There are other parameters also mentioned in the code for serving containers, you can remove them if you don't need that.

Related

Access information from a previous Kubeflow component

I have a ModelBatchPredictOp component in my pipeline. This component generates 3 artifacts: batchpredictionjob, big_query_table, and gcs_output_directory. The pipeline is running fine.
What I need is a way to access the tableId property of artifact big_query_table, so I can use it in the next component (a BigqueryQueryJobOp).
That is what I want:
The URI information would be good as well since it contains the full path for the created table and I can extract the desired part.
This is my 2nd component which is creating the batch prediction and the 3rd component which should consume the previous outputs.
# Component to do the batch prediction
batch_predict_op = ModelBatchPredictOp(
project=project_id,
location=DEFAULT_VERTEX_REGION,
instances_format = 'bigquery',
predictions_format = 'bigquery',
model=importer_spec.outputs['artifact'],
job_display_name='teste_batch_predict',
bigquery_source_input_uri=f'bq://{input_data_table_ref}',
bigquery_destination_output_uri= f'bq://{output_bq}',
).after(input_data_table_op)
top_predictions_table_ref = f'{project_id}.{bigquery_dataset}.test'
# Component to create the table based on previous component
top_predictions_op = bq.BigqueryQueryJobOp(
project_id,
location = bigquery_job_location,
query = predict_dataset.get_query(
output_table = top_predictions_table_ref,
source_table = batch_predict_op.outputs['bigquery_output_table'],
query_name = 'query_top_100.sql',
DEBUG = DEBUG)
).after(batch_predict_op)

How do I enable following URL capability to work in my code?

I am attempting to add the follow url capability but can't seem to get it to work. I need to crawl all the pages. There are around 108 pages of the job listings. Thank you.
import scrapy
class JobItem(scrapy.Item):
# Data structure to store the title, company name and location of the job
title = scrapy.Field()
company = scrapy.Field()
location = scrapy.Field()
class PythonDocumentationSpider(scrapy.Spider):
name = 'pydoc'
start_urls = ['https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab']
def parse(self, response):
for follow_href in response.xpath('//h2[#class="fs-body2 job-details__spaced mb4"]/a/#href'):
follow_url = response.urljoin(follow_href.extract())
yield scrapy.Request(follow_url, callback=self.parse_page_title)
for a_el in response.xpath('//div[#class="-job-summary"]'):
section = JobItem()
section['title'] = a_el.xpath('.//a[#class="s-link s-link__visited job-link"]/text()').extract()[0]
span_texts = a_el.xpath('.//div[#class="fc-black-700 fs-body1 -company"]/span/text()').extract()
section['company'] = span_texts[0]
section['location'] = span_texts[1]
print(section['location'])
#print(type(section))
yield section
I am attempting to get the following url capability to work with my code and then be able to crawl the pages and store job postings in csv file.
.extract() return a list. In most cases you'll need to use .get() or .extract_first() instead if you don't need a list.
First you need to rewrite this part:
for follow_href in response.xpath('//h2[#class="fs-body2 job-details__spaced mb4"]/a/#href').getall(): # or .extract()
follow_url = response.urljoin(follow_href)
yield scrapy.Request(follow_url, callback=self.parse_page_title)

Tensorflow serving in Go

I am trying to run a keras model in Go. First I train the model in python:
import keras as krs
from keras import backend as K
import tensorflow as tf
sess = tf.Session()
K.set_session(sess)
K._LEARNING_PHASE = tf.constant(0)
K.set_learning_phase(0)
m1 = krs.models.Sequential()
m1.Add(krs.layers.Dense(..., name="inputNode"))
...
m1.Add(krs.layers.Dense(..., activation="softmax", name="outputNode"))
m1.compile(...)
m1.fit(...)
Then I understand that it is advised that the model is frozen - to convert placeholder to constants.
saver = tf.train.Saver()
tf.train.write_graph(sess.graph_def, '.', 'my_model.pbtxt')
saver.save(sess, save_path="my_model.ckpt")
from tensorflow.python.tools import freeze_graph
from tensorflow.python.tools import optimize_for_inference_lib
freeze_graph.freeze_graph(input_graph = 'my_model.pbtxt', input_saver = "",
input_binary = False, input_checkpoint = "my_model.ckpt", output_node_names = "outputNode/Softmax",
restore_op_name = "save/restore_all", filename_tensor_name = "save/Const:0",
output_graph = "frozen_my_model.pb", clear_devices = True, initializer_nodes = "")
When trying to use the frozen model in Golang:
model, err := tf.LoadSavedModel("frozen_my_model.pb", []string{"serve"}, nil)
It returns an error that the tag serve is not found SavedModel load for tags { serve }; Status: fail.
My questions are therefore:
How do you freeze a model in python, then load it in Go
I do this to speed up inference in Go - is it correct that freezing
models will improve inference speed?
I have noted that another function exists optimize_for_inference, how would this implemented in the above setting?
You have to "tag" the trained model, using
# Create a builder to export the model
builder = tf.saved_model.builder.SavedModelBuilder("export")
# Tag the model in order to be capable of restoring it specifying the tag set
builder.add_meta_graph_and_variables(sess, ["tag"])
builder.save()
After that, you can load it in Go.
However, a more handy solution is to use tfgo
As you can see in the README, there's the code for both: train in python and inference in Go.
I'll report here for you:
Python: train LeNet on MNIST (example)
import sys
import tensorflow as tf
from dytb.inputs.predefined.MNIST import MNIST
from dytb.models.predefined.LeNetDropout import LeNetDropout
from dytb.train import train
def main():
"""main executes the operations described in the module docstring"""
lenet = LeNetDropout()
mnist = MNIST()
info = train(
model=lenet,
dataset=mnist,
hyperparameters={"epochs": 2},)
checkpoint_path = info["paths"]["best"]
with tf.Session() as sess:
# Define a new model, import the weights from best model trained
# Change the input structure to use a placeholder
images = tf.placeholder(tf.float32, shape=(None, 28, 28, 1), name="input_")
# define in the default graph the model that uses placeholder as input
_ = lenet.get(images, mnist.num_classes)
# The best checkpoint path contains just one checkpoint, thus the last is the best
saver = tf.train.Saver()
saver.restore(sess, tf.train.latest_checkpoint(checkpoint_path))
# Create a builder to export the model
builder = tf.saved_model.builder.SavedModelBuilder("export")
# Tag the model in order to be capable of restoring it specifying the tag set
builder.add_meta_graph_and_variables(sess, ["tag"])
builder.save()
return 0
if __name__ == '__main__':
sys.exit(main())
Go: inference
package main
import (
"fmt"
tg "github.com/galeone/tfgo"
tf "github.com/tensorflow/tensorflow/tensorflow/go"
)
func main() {
model := tg.LoadModel("test_models/export", []string{"tag"}, nil)
fakeInput, _ := tf.NewTensor([1][28][28][1]float32{})
results := model.Exec([]tf.Output{
model.Op("LeNetDropout/softmax_linear/Identity", 0),
}, map[tf.Output]*tf.Tensor{
model.Op("input_", 0): fakeInput,
})
predictions := results[0].Value().([][]float32)
fmt.Println(predictions)
}

how to save/load a trained model in H2o?

The user tutorial says
Navigate to Data > View All
Choose to filter by the model key
Hit Save Model
Input for path: /data/h2o-training/...
Hit Submit
The problem is that I do not have this menu (H2o, 3.0.0.26, web interface)
I am, unfortunately, not familiar with the web interface but I can offer a workaround involving H2O in R. The functions
h2o.saveModel(object, dir = "", name = "", filename = "", force = FALSE)
and
h2o.loadModel(path, conn = h2o.getConnection())
Should offer what you need. I will try to have a look at H2O Flow.
Update
I cannot find the possibility to explicitly save a model either. What you can do instead is save the 'Flow'. You ergo could upload/import your file, build the model and then save / load the status :-)
When viewing the model in H2O Flow, you will see an 'Export' button as an action that can be taken against a model
From there, you will be prompted to specify a path in 'Export Model' dialog. Specify the path and hit the 'Export' button. That will save you model to disk.
I'm referring to H2O version 3.2.0.3
A working example that I've used recently while building a deep learning model in version 2.8.6 in h2o.The model was saved in hdfs.For latest version you probably have to remove the classification=T switch and have to replace data with training_frame
library(h2o)
h = h2o.init(ip="xx.xxx.xxx.xxx", port=54321, startH2O = F)
cTrain.h2o <- as.h2o(h,cTrain,key="c1")
cTest.h2o <- as.h2o(h,cTest,key="c2")
nh2oD<-h2o.deeplearning(x =c(1:12),y="tgt",data=cTrain.h2o,classification=F,activation="Tanh",
rate=0.001,rho=0.99,momentum_start=0.5,momentum_stable=0.99,input_dropout_ratio=0.2,
hidden=c(12,25,11,11),hidden_dropout_ratios=c(0.4,0.4,0.4,0.4),
epochs=150,variable_importances=T,seed=1234,reproducible = T,l1=1e-5,
key="dn")
hdfsdir<-"hdfs://xxxxxxxxxx/user/xxxxxx/xxxxx/models"
h2o.saveModel(nh2oD,hdfsdir,name="DLModel1",save_cv=T,force=T)
test=h2o.loadModel(h,path=paste0(hdfsdir,"/","DLModel1"))
This should be what you need:
library(h2o)
h2o.init()
path = system.file("extdata", "prostate.csv", package = "h2o")
h2o_df = h2o.importFile(path)
h2o_df$CAPSULE = as.factor(h2o_df$CAPSULE)
model = h2o.glm(y = "CAPSULE",
x = c("AGE", "RACE", "PSA", "GLEASON"),
training_frame = h2o_df,
family = "binomial")
h2o.download_pojo(model)
http://h2o-release.s3.amazonaws.com/h2o/rel-slater/5/docs-website/h2o-docs/index.html#POJO%20Quick%20Start
How to save models in H2O Flow:
go to "List All Models"
in the model details, you will find an "Export" option
enter the model name you want to save it as
import it back again
How to save a model trained in h2o-py:
# say "rf" is your H2ORandomForestEstimator object. To export it
>>> path = h2o.save_model(rf, force=True) # save_model() returns the path
>>> path
u'/home/user/rf'
#to import it back again(as a new object)
>>> rafo = h2o.load_model(path)
>>> rafo # prints model details
Model Details
=============
H2ORandomForestEstimator : Distributed Random Forest
Model Key: drf1
Model Summary:
######Prints model details...................

Displaying custom tags in YRI

I've created some custom tags in my .yardopts file that look like this:
--name-tag optfield:"Field that is not required to be filled out by the user."
--name-tag nonfield:"Parameter that is not a field to be filled out by the user."
They are being displayed quite nicely when I view them in a web-browser, but when I try to view a class with yri I don't see my custom tags being displayed.
For example, this class
#This is a boring class
class Boring
# This function is the index.
# #param required_field [String] Required!
# #optfield optional_argument [String] Totally not required dude.
# #nonfield webData [WebData] Not even an option.
def index(optional_argument = "", webData = $wedData)
end
end
Get displayed as this in yri when I run yri Boring#index
------------------------------------------------ Method: #index (Boring)
(Defined in: YardTest.rb)
boring.index(optional_argument = "", webData = $wedData) -> Object
------------------------------------------------------------------------
This function is the index.
Parameters:
-----------
(String) required_field - Required!
Is there a way to configure yri to display my custom tags?

Resources