How can I access to text data (value) of feature importance from a traiend model on Vertex AI with Python - google-ai-platform

I am working on Predictive Modeling with AutoML on Vertex AI and got the trained model. I checked its feature importance Graphically at Model Tab on Vertex AI, and then I want to have its text data of feature importance with the below code, but can only see it and can not get each of items as value.
------------------------ Python Code
from google.cloud import aiplatform_v1 as aiplatform2
api_endpoint = 'us-central1-aiplatform.googleapis.com'
client_options = {"api_endpoint": api_endpoint} # api_endpoint is required for client_options
client_model = aiplatform2.services.model_service.ModelServiceClient(client_options=client_options)
project_id = 'this is my project id'
location = 'us-central1'
model_id = 'my trained id'
model_name = f'projects/{project_id}/locations/{location}/models/{model_id}'
list_eval_request = aiplatform2.types.ListModelEvaluationsRequest(parent=model_name)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
list_eval.model_evaluations
------------------------ Get this just visually on Notebook
[name: "projects/*********/locations/us-central1/models/*********/evaluations/*********"
metrics_schema_uri: "gs://google-cloud-aiplatform/schema/modelevaluation/regression_metrics_1.0.0.yaml"
metrics {
struct_value {
fields {
key: "meanAbsoluteError"
value {
number_value: 2863.7043
}
}
fields {
key: "meanAbsolutePercentageError"
value {
number_value: 197.63817
}
------------------------ Question
How can I access to the "key" and its "value".
Example key: "meanAbsoluteError" / value : number_value: 2863.7043

I get it like this
for a in list_eval.model_evaluations[0].metrics:
b = str(list_eval.model_evaluations[0].metrics[a])
v_str = a + ' : ' + b
print(v_str)
rootMeanSquaredLogError : 1.2712421
rootMeanSquaredError : 26191.564
rSquared : 0.31798086
meanAbsoluteError : 5698.832
meanAbsolutePercentageError : 262.9534
"metrics" is simply a dictionary

Related

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

Get position of token in berts output layer

We are interested in the bert vectors for each token. With bert vector we mean the word vector for a specific token in berts output layer. So we would like to find out which token produces which bert vector. We wrote some code but we are not sure if it is correct or how to test it.
So in the code we process a sentence with bert. We construct a list of position ids and hand them to the model. Afterwards we use the same position ids to map the tokens to the output layer. Then there is some code that produces calculates the character offsets of each vector in the input sentence.
Is this the correct way how to use position_ids to generate
from transformers import BertModel, BertConfig, BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
def sentence_to_vector(input_sentence):
tokens_encoded = tokenizer.encode(input_sentence, add_special_tokens=True)
input_ids = torch.tensor(tokens_encoded).unsqueeze(0) # Batch size 1
seq_length = input_ids.size(1)
# code to construct position_ids from here:
# https://github.com/huggingface/transformers/blob/8da280ebbeca5ebd7561fd05af78c65df9161f92/pytorch_pretrained_bert/modeling.py#L188:L189
position_ids = torch.arange(seq_length, dtype=torch.long, device=input_ids.device)
position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
outputs = model(input_ids, position_ids=position_ids)
tokens = tokenizer.convert_ids_to_tokens(input_ids[0])
# from the BertModel documentation (example at the bottom):
# The last hidden-state is the first element of the output tuple
# https://huggingface.co/transformers/model_doc/bert.html#transformers.BertModel
#ttv = {} # token to vector
#for i in position_ids[0]:
# ttv[tokens[i]] = outputs[0][0][position_ids[0][i]]
data = []
last_offset = 0
for i in range(0, len(position_ids[0])):
token = tokens[position_ids[0][i]]
vector = outputs[0][0][position_ids[0][i]]
pos_begin = None
pos_end = None
if not token == "[CLS]" and not token == "[SEP]":
pos_begin = input_sentence.find(token, last_offset)
pos_end = pos_begin + len(token)
last_offset = pos_end
data.append({
"token": token,
"pos_begin": pos_begin,
"pos_end": pos_end,
"vector": vector
})
return data
input_sentence = "do the chicken dance!"
data = sentence_to_vector(input_sentence)
for token in data:
print(token["token"] + "\t" + str(token["pos_begin"]) + "\t" + str(token["pos_end"]) + "\t" + str(token["vector"][0:3]) + "..." )

How do I interop with this Javascript code, from Fable F#?

I want to create a binding of the Plotly.js library to Fable.
I am looking at this js code
import React from 'react';
import Plot from 'react-plotly.js';
class App extends React.Component {
render() {
return (
<Plot
data={[
{
x: [1, 2, 3],
y: [2, 6, 3],
type: 'scatter',
mode: 'lines+points',
marker: {color: 'red'},
},
{type: 'bar', x: [1, 2, 3], y: [2, 5, 3]},
]}
layout={ {width: 320, height: 240, title: 'A Fancy Plot'} }
/>
);
}
}
and my (faulty) attempt of creating a simple test binding looks like this
open Fable.Core
open Fable.Core.JsInterop
open Browser.Types
open Fable.React
// module Props =
type Chart =
|X of int list
|Y of int List
|Type of string
type IProp =
| Data of obj list
let inline plot (props: IProp) : ReactElement =
ofImport "Plot" "react-plotly.js" props []
let myTrace = createObj [
"x" ==> [1,2,3]
"y" ==> [2,6,3]
"type" ==> "scatter"
"mode" ==> "lines"
]
let myData = Data [myTrace]
let testPlot = plot myData
But obviously it does not work. How do I get it to work? Also, what does {[...]} mean? I am new to Javascript, and as far as I know {...} denotes an object which must contain name value pairs, and [...] denotes an array. So {[...]} seems to denote an object with a single nameless member that is an array, but as far as I know, there are no objects with nameless members.
I have been able to reproduce the example you linked. Please note that I don't Plotly and that I went the empiric way and so things can probably be improved :)
I have created the code as I would probably have done it if I had to use it in my production app. So there is a bit more code than in your question because I don't use createObj.
If you don't like the typed DSL you can always simplify it, remove it and use createObj or anonymous record like I did for the marker property :)
You need to install both react-plotly.js plotly.js in your project.
open Fable.Core.JsInterop
open Fable.Core
open Fable.React
// Define props using DUs this helps create a typed version of the React props
// You can then transform a list of props into an object using `keyValueList`
[<RequireQualifiedAccess>]
type LayoutProps =
| Title of string
| Width of int
| Height of int
// GraphType is marked as a `StringEnum` this means
// the value will be replace at compile time with
// their string representation so:
// `Scatter` becomes `"scatter"`
// You can customise the output by using `[<CompiledName("MyCustomName")>]
[<RequireQualifiedAccess; StringEnum>]
type GraphType =
| Scatter
| Bar
[<RequireQualifiedAccess; StringEnum>]
type GraphMode =
| Lines
| Points
| Markers
| Text
| None
[<RequireQualifiedAccess>]
type DataProps =
| X of obj array
| Y of obj array
| Type of GraphType
| Marker of obj
// This is an helpers to generate the `flagList` waited by Plotly, if you don't like it you can just remove
// member and replace it with `| Mode of string` and so you have to pass the string by yourself
static member Mode (modes : GraphMode seq) : DataProps =
let flags =
modes
|> Seq.map unbox<string> // This is safe to do that because GraphMode is a StringEnum
|> String.concat "+"
unbox ("mode", flags)
[<RequireQualifiedAccess>]
type PlotProps =
| Nothing // Should have real props here is there exist more than Data and Layout
// Here notes that we are asking for an `Array` or Data
// Array being the type expected by the JavaScript library
// `DataProps seq` is our way to represents props
static member Data (dataList : (DataProps seq) array) : PlotProps =
let datas =
dataList
|> Array.map (fun v ->
keyValueList CaseRules.LowerFirst v // Transform the list of props into a JavaScript object
)
unbox ("data", datas)
static member Layout (props : LayoutProps seq) : PlotProps =
unbox ("layout", keyValueList CaseRules.LowerFirst props)
// All the example I saw from react-plotly was using this factory function to transform the plotly library into a React component
// Even, the example you shown if you look at the Babel tab in the live example
let createPlotlyComponent (plotly : obj) = import "default" "react-plotly.js/factory"
// Immport the plotly.js library
let plotlyLib : obj = import "default" "plotly.js"
// Apply the factory on the plotly library
let Plot : obj = createPlotlyComponent plotlyLib
// Helper function to instantiate the react components
// This is really low level, in general we use `ofImport` like you did but if I use `ofImport` then I got a React error
let inline renderPlot (plot : obj) (props : PlotProps list) =
ReactBindings.React.createElement(plot, (keyValueList CaseRules.LowerFirst props), [])
let root =
// Here we can render the plot using our Typed DSL
renderPlot
Plot
[
PlotProps.Data
[|
[
DataProps.X [| 1; 2; 3 |]
DataProps.Y [| 2; 6; 3 |]
DataProps.Type GraphType.Scatter
DataProps.Mode
[
GraphMode.Lines
GraphMode.Points
]
DataProps.Marker {| color = "red" |}
]
[
DataProps.Type GraphType.Bar
DataProps.X [| 1; 2; 3 |]
DataProps.Y [| 2; 5; 3 |]
]
|]
PlotProps.Layout
[
LayoutProps.Width 640
LayoutProps.Height 480
LayoutProps.Title "A Fancy Plot"
]
]
I'm a bit late to the party here, but wanted to give you a different option if you're still looking to use plotly.js with Fable.
I've been working on bindings for plotly.js for the past month or so, and it's in a pretty usable state as of now. That being said, I wouldn't say it's production ready.
This is what the example you want to convert would look like written with Feliz.Plotly:
open Feliz
open Feliz.Plotly
let chart () =
Plotly.plot [
plot.traces [
traces.scatter [
scatter.x [ 1; 2; 3 ]
scatter.y [ 2; 6; 3 ]
scatter.mode [
scatter.mode.lines
scatter.mode.markers
]
scatter.marker [
marker.color color.red
]
]
traces.bar [
bar.x [ 1; 2; 3 ]
bar.y [ 2; 5; 3 ]
]
]
plot.layout [
layout.width 320
layout.height 240
layout.title [
title.text "A Fancy Plot"
]
]
]
You can find more information out here.

How to update bokeh active interaction with GeoJSON as data source?

I have made an interactive choropleth map with bokeh, and I'm trying to add active interactions using the dropdown widget (Select). However, most tutorials and SO questions about active interactions use ColumnDataSource, and not GeoJSONDataSource.
The issue is that GeoJSONDataSource doesn't have a .data method like ColumnDataSource does, so idk exactly how the syntax works when updating it.
My dataset is a dictionary in the form of city_dict = {'Amsterdam': <some data frame>, 'Antwerp': <some data frame>, ...}, where the dataframe is in geojson format. I have already confirmed that this format works when making glyphs.
def update(attr, old, new):
s_value = dropdown.value
p.title.text = '%s', s_value
new_src1 = make_dataset(s_value)
val1 = GeoJSONDataSource(new_src1)
r1.data_source = val1
where make_dataset is a function that transforms my original dataset into a dataset that can feed into the GeoJSONDataSource function. make_dataset requires a string (name of the city) to work eg. 'Amsterdam'. It works on passive interactions.
The main plot code (removed unnecessary stuff) is:
dropdown = Select(value='Amsterdam', options = cities)
controls = WidgetBox(dropdown)
initial_city = 'Amsterdam'
a = make_dataset(initial_city)
src1 = GeoJSONDataSource(a)
p = figure(title = 'Amsterdam', plot_height = 750 , plot_width = 900, toolbar_location = 'right')
r1 = p.patches('xs','ys', source = src1, fill_color = {'field' :'norm', 'transform' : color_mapper})
dropdown.on_change('value', update)
layout = row(controls, p)
curdoc().add_root(layout)
I've added the error I get. error handling message Message 'PATCH-DOC' (revision 1) content: {'events': [{'kind': 'ModelChanged', 'model': {'type': 'Select', 'id': '1147'}, 'attr': 'value', 'new': 'Antwerp'}], 'references': []}: ValueError("expected a value of type str, got ('%s', 'Antwerp') of type tuple",)

Unable to predict when loading a Tensorflow model in Go

I've loaded a Tensorflow model in Go and cannot get predictions - it keeps complaining about shape mismatch - a simple 2d array. Would appreciate an idea here, thank you so much in advance.
Error running the session with input, err: You must feed a value for placeholder tensor 'theoutput_target' with dtype float
[[Node: theoutput_target = Placeholder[_output_shapes=[[?,?]], dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Input tensor being sent is a [][]float32{ {1.0}, }
a := [][]float32{ {1.0}, }
tensor, terr := tf.NewTensor(a)
if terr != nil {
fmt.Printf("Error creating input tensor: %s\n", terr.Error())
return
}
result, runErr := model.Session.Run(
map[tf.Output]*tf.Tensor{
model.Graph.Operation("theinput").Output(0): tensor,
},
[]tf.Output{
model.Graph.Operation("theoutput_target").Output(0),
},
nil,
)
and the model is generated via Keras and exported to TF using SavedModelBuilder after:
layer_name_input = "theinput"
layer_name_output = "theoutput"
def get_encoder():
model = Sequential()
model.add(Dense(5, input_dim=1))
model.add(Activation("relu"))
model.add(Dense(5, input_dim=1))
return model
inputs = Input(shape=(1, ), name=layer_name_input)
encoder = get_encoder()
model = encoder(inputs)
model = Activation("relu")(model)
objective = Dense(1, name=layer_name_output)(model)
model = Model(inputs=[inputs], outputs=objective)
model.compile(loss='mean_squared_error', optimizer='sgd')
EDIT - fixed, it was a problem with exporting from Keras to TF (layer names). Pasting the export here, hopefully helpful for someone else:
def export_to_tf(keras_model_path, export_path, export_version, is_functional=False):
sess = tf.Session()
K.set_session(sess)
K.set_learning_phase(0)
export_path = os.path.join(export_path, str(export_version))
model = load_model(keras_model_path)
config = model.get_config()
weights = model.get_weights()
if is_functional == True:
model = Model.from_config(config)
else:
model = Sequential.from_config(config)
model.set_weights(weights)
with K.get_session() as sess:
inputs = [ (model_input.name.split(":")[0], model_input) for model_input in model.inputs]
outputs = [ (model_output.name.split(":")[0], model_output) for model_output in model.outputs]
signature = predict_signature_def(inputs=dict(inputs),
outputs=dict(outputs))
input_descriptor = [ { 'name': item[0], 'shape': item[1].shape.as_list() } for item in inputs]
output_descriptor = [ { 'name': item[0], 'shape': item[1].shape.as_list() } for item in outputs]
builder = saved_model_builder.SavedModelBuilder(export_path)
builder.add_meta_graph_and_variables(
sess=sess,
tags=[tag_constants.SERVING],
signature_def_map={signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature})
builder.save()
descriptor = dict()
descriptor["inputs"] = input_descriptor
descriptor["outputs"] = output_descriptor
pprint.pprint(descriptor)
That's something strange in your code and error. Tensorflow is complaining about a missing value for the placeholder with name 'theoutput_target', whilst this placeholder is never defined in the code you posted. Instead, your code defines a placeholder whose name is 'theinput'.
Also, I suggest you to use a more complete and easy to use wrapper around the tensorflow API: tfgo

Resources