how to save/load a trained model in H2o? - h2o

The user tutorial says
Navigate to Data > View All
Choose to filter by the model key
Hit Save Model
Input for path: /data/h2o-training/...
Hit Submit
The problem is that I do not have this menu (H2o, 3.0.0.26, web interface)

I am, unfortunately, not familiar with the web interface but I can offer a workaround involving H2O in R. The functions
h2o.saveModel(object, dir = "", name = "", filename = "", force = FALSE)
and
h2o.loadModel(path, conn = h2o.getConnection())
Should offer what you need. I will try to have a look at H2O Flow.
Update
I cannot find the possibility to explicitly save a model either. What you can do instead is save the 'Flow'. You ergo could upload/import your file, build the model and then save / load the status :-)

When viewing the model in H2O Flow, you will see an 'Export' button as an action that can be taken against a model
From there, you will be prompted to specify a path in 'Export Model' dialog. Specify the path and hit the 'Export' button. That will save you model to disk.
I'm referring to H2O version 3.2.0.3

A working example that I've used recently while building a deep learning model in version 2.8.6 in h2o.The model was saved in hdfs.For latest version you probably have to remove the classification=T switch and have to replace data with training_frame
library(h2o)
h = h2o.init(ip="xx.xxx.xxx.xxx", port=54321, startH2O = F)
cTrain.h2o <- as.h2o(h,cTrain,key="c1")
cTest.h2o <- as.h2o(h,cTest,key="c2")
nh2oD<-h2o.deeplearning(x =c(1:12),y="tgt",data=cTrain.h2o,classification=F,activation="Tanh",
rate=0.001,rho=0.99,momentum_start=0.5,momentum_stable=0.99,input_dropout_ratio=0.2,
hidden=c(12,25,11,11),hidden_dropout_ratios=c(0.4,0.4,0.4,0.4),
epochs=150,variable_importances=T,seed=1234,reproducible = T,l1=1e-5,
key="dn")
hdfsdir<-"hdfs://xxxxxxxxxx/user/xxxxxx/xxxxx/models"
h2o.saveModel(nh2oD,hdfsdir,name="DLModel1",save_cv=T,force=T)
test=h2o.loadModel(h,path=paste0(hdfsdir,"/","DLModel1"))

This should be what you need:
library(h2o)
h2o.init()
path = system.file("extdata", "prostate.csv", package = "h2o")
h2o_df = h2o.importFile(path)
h2o_df$CAPSULE = as.factor(h2o_df$CAPSULE)
model = h2o.glm(y = "CAPSULE",
x = c("AGE", "RACE", "PSA", "GLEASON"),
training_frame = h2o_df,
family = "binomial")
h2o.download_pojo(model)
http://h2o-release.s3.amazonaws.com/h2o/rel-slater/5/docs-website/h2o-docs/index.html#POJO%20Quick%20Start

How to save models in H2O Flow:
go to "List All Models"
in the model details, you will find an "Export" option
enter the model name you want to save it as
import it back again
How to save a model trained in h2o-py:
# say "rf" is your H2ORandomForestEstimator object. To export it
>>> path = h2o.save_model(rf, force=True) # save_model() returns the path
>>> path
u'/home/user/rf'
#to import it back again(as a new object)
>>> rafo = h2o.load_model(path)
>>> rafo # prints model details
Model Details
=============
H2ORandomForestEstimator : Distributed Random Forest
Model Key: drf1
Model Summary:
######Prints model details...................

Related

Access information from a previous Kubeflow component

I have a ModelBatchPredictOp component in my pipeline. This component generates 3 artifacts: batchpredictionjob, big_query_table, and gcs_output_directory. The pipeline is running fine.
What I need is a way to access the tableId property of artifact big_query_table, so I can use it in the next component (a BigqueryQueryJobOp).
That is what I want:
The URI information would be good as well since it contains the full path for the created table and I can extract the desired part.
This is my 2nd component which is creating the batch prediction and the 3rd component which should consume the previous outputs.
# Component to do the batch prediction
batch_predict_op = ModelBatchPredictOp(
project=project_id,
location=DEFAULT_VERTEX_REGION,
instances_format = 'bigquery',
predictions_format = 'bigquery',
model=importer_spec.outputs['artifact'],
job_display_name='teste_batch_predict',
bigquery_source_input_uri=f'bq://{input_data_table_ref}',
bigquery_destination_output_uri= f'bq://{output_bq}',
).after(input_data_table_op)
top_predictions_table_ref = f'{project_id}.{bigquery_dataset}.test'
# Component to create the table based on previous component
top_predictions_op = bq.BigqueryQueryJobOp(
project_id,
location = bigquery_job_location,
query = predict_dataset.get_query(
output_table = top_predictions_table_ref,
source_table = batch_predict_op.outputs['bigquery_output_table'],
query_name = 'query_top_100.sql',
DEBUG = DEBUG)
).after(batch_predict_op)

How do I do model versioning with AutoMLImageTrainingJobRunOp?

I know in Vertex AI you can version models. You can eg upload a model and set its parent_model:
model_v2 = aip.Model.upload(parent_model=model_v1.resource_name,...
And I know in the GUI you can create an AutoML model that is a version of an existing one, but how do you do it in code?
In a pipeline I use AutoMLImageTrainingJobRunOp but it does not have a parent_model parameter.
You can enable model versioning using the below code snippet:
from google.cloud import aiplatform
DISPLAY_NAME = "model_name"
models = aiplatform.Model.list(filter=("display_name={}").format(DISPLAY_NAME))
if len(models) == 0:
model_upload = aiplatform.Model.upload(
display_name = DISPLAY_NAME, # Your model display name
version_description="Add model description here", #Add model description
version_aliases=["v1"], # Create Model Alias
labels={"release": "dev"}, #Label your model
artifact_uri = model.uri[:-6],
...
)
else:
parent_model = models[0].resource_name
version_id = models[0].version_id
model_upload = aiplatform.Model.upload(
display_name = DISPLAY_NAME,
...,
parent_model = parent_model
)
There are other parameters also mentioned in the code for serving containers, you can remove them if you don't need that.

How do I enable following URL capability to work in my code?

I am attempting to add the follow url capability but can't seem to get it to work. I need to crawl all the pages. There are around 108 pages of the job listings. Thank you.
import scrapy
class JobItem(scrapy.Item):
# Data structure to store the title, company name and location of the job
title = scrapy.Field()
company = scrapy.Field()
location = scrapy.Field()
class PythonDocumentationSpider(scrapy.Spider):
name = 'pydoc'
start_urls = ['https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab']
def parse(self, response):
for follow_href in response.xpath('//h2[#class="fs-body2 job-details__spaced mb4"]/a/#href'):
follow_url = response.urljoin(follow_href.extract())
yield scrapy.Request(follow_url, callback=self.parse_page_title)
for a_el in response.xpath('//div[#class="-job-summary"]'):
section = JobItem()
section['title'] = a_el.xpath('.//a[#class="s-link s-link__visited job-link"]/text()').extract()[0]
span_texts = a_el.xpath('.//div[#class="fc-black-700 fs-body1 -company"]/span/text()').extract()
section['company'] = span_texts[0]
section['location'] = span_texts[1]
print(section['location'])
#print(type(section))
yield section
I am attempting to get the following url capability to work with my code and then be able to crawl the pages and store job postings in csv file.
.extract() return a list. In most cases you'll need to use .get() or .extract_first() instead if you don't need a list.
First you need to rewrite this part:
for follow_href in response.xpath('//h2[#class="fs-body2 job-details__spaced mb4"]/a/#href').getall(): # or .extract()
follow_url = response.urljoin(follow_href)
yield scrapy.Request(follow_url, callback=self.parse_page_title)

How to find Knowledge base ID (kbid) for QnAMaker?

I am trying to integrate QnAmaker knowledge base with Azure Bot Service.
I am unable to find knowledge base id on QnAMaker portal.
How to find the kbid in QnAPortal?
The Knowledge Base Id can be located in Settings under “Deployment details” in your knowledge base. It is the guid that is nestled between “knowledgebases” and “generateAnswer” in the POST (see image below).
Hope of help!
Hey you can also use python to get this by take a look at the following code.
That is if you wanted to write a program to dynamically get the kb ids.
import http.client, os, urllib.parse, json, time, sys
# Represents the various elements used to create HTTP request path for QnA Maker
operations.
# Replace this with a valid subscription key.
# User host = '<your-resource-name>.cognitiveservices.azure.com'
host = '<your-resource-name>.cognitiveservices.azure.com'
subscription_key = '<QnA-Key>'
get_kb_method = '/qnamaker/v4.0/knowledgebases/'
try:
headers = {
'Ocp-Apim-Subscription-Key': subscription_key,
'Content-Type': 'application/json'
}
conn = http.client.HTTPSConnection(host)
conn.request ("GET", get_kb_method, None, headers)
response = conn.getresponse()
data = response.read().decode("UTF-8")
result = None
if len(data) > 0:
result = json.loads(data)
print
#print(json.dumps(result, sort_keys=True, indent=2))
# Note status code 204 means success.
KB_id = result["knowledgebases"][0]["id"]
print(response.status)
print(KB_id)
except :
print ("Unexpected error:", sys.exc_info()[0])
print ("Unexpected error:", sys.exc_info()[1])

Diff of Value Changes inside Reference Changes

I have a Application entity and a User entity. Theres a one to many relation from application to users, i.e one application can have multiple users.
i update the user and change the application assigned to him. So far i have below:
Diff diff = javers.compare(oldUser, newUser);
List<ReferenceChange> referenceChanges = diff.getChangesByType(ReferenceChange.class);
for (ReferenceChange referenceChange : referenceChanges) {
Object oldRef = referenceChange.getLeftObject().get();
Object newRef = referenceChange.getRightObject().get();
Diff refDiff = javers.compare(oldRef,newRef);
}
Please find the below result while debugging:
referenceChanges List:
ReferenceChange{globalId:'com.ds.appmanager.services.domain.User/3', property:'application', oldRef:'com.ds.appmanager.services.domain.Application/7', newRef:'com.ds.appmanager.services.domain.Application/3'}
oldRef
Application [applicationId=7, applicationName=KYC, applicationDesc=DEV_TEST, applicationLaunch=2016-12-31 00:00:00.0, live=false]
newRef
Application [applicationId=3, applicationName=KYC, applicationDesc=hellp, applicationLaunch=Sat Mar 12 00:00:00 IST 2016, live=true]
When i pass both objects to javers.compare(oldRef,newRef), it again gives me same result as referenceChanges. I thought it should give me Value Changes of each property e.g, applicationDesc, oldVal = DEV_TEST, newVal = hellp.
Am i missing something here? All i want to do is to get the value changes of properties of the Application object which is referenced inside User object.
P.S: If i change Object oldRef = referenceChange.getLeftObject().get(); to Object oldRef = referenceChange.getLeftObject();, i get JaversException: CLASS_EXTRACTION_ERROR JaVers bootstrap error - Don't know how to extract Class from type 'T'.
UPDATE
Just noted i am not getting ValueChanges while doing Diff refDiff = javers.compare(oldRef,newRef);
Debug shows follwing changes:
Diff:
1. NewObject{globalId:'com.ds.appmanager.services.domain.Application/7'}
2. ObjectRemoved{globalId:'com.ds.appmanager.services.domain.User/5'}
3. ObjectRemoved{globalId:'com.ds.appmanager.services.domain.Application/3'}
4. ObjectRemoved{globalId:'com.ds.appmanager.services.domain.User/6'}
5. ObjectRemoved{globalId:'com.ds.appmanager.services.domain.User/7'}
6. ReferenceChange{globalId:'com.ds.appmanager.services.domain.User/3', property:'application', oldRef:'com.ds.appmanager.services.domain.Application/3', newRef:'com.ds.appmanager.services.domain.Application/7'}

Resources