Do AutoMl predictions not work when uploaded into Google Cloud Functions - google-cloud-automl

Im writing code that makes a prediction based on a trained AutoMl multi-label Classifier. The function works if I run it locally, however, as soon as i upload the same code to Cloud Functions on GCP (a process that i know usually works) it provides me with this error
TypeError: predict() takes from 1 to 2 positional arguments but 4 were given
Here is a sample of my code, taken straight from the AutoMl documentation with some slight adjustments.
def get_sentiment(content):
"""
Returns a google cloud platform payload class containing the sentiment score given by our NLP sentiment analyser.
:param content: STRING (UTF-8 encoded, ASCII)
:return: <class 'google.cloud.automl.types.PredictResponse'>
"""
options = ClientOptions(api_endpoint='automl.googleapis.com')
prediction_client = automl_v1beta1.PredictionServiceClient(client_options=options)
name = model_sentiment
payload = {'text_snippet': {'content': content, 'mime_type': 'text/plain'}}
params = {}
request = prediction_client.predict(name, payload, params)
return request
I have tried removing the params variable from prediction and replacing payload with content the only change is that I get the error:
TypeError: predict() takes from 1 to 2 positional arguments but 3 were given
Additionally, I have replaced automl_v1beta1 with automl and automl_v1. and again while both work locally they do not work on Google Cloud.
Thank you for any advice or help

Update, Apparently there are some bugs in the latest version of AutoML and the error was fixed by running the code on a previous version of it. Specifically in my case v0.9.0

Related

Ruby YouTube Data API v3 insert caption always returns error

I am trying to use the Ruby SDK to upload videos to YouTube automatically. Inserting a video, deleting a video, and setting the thumbnail for a video works fine, but for some reason trying to add captions results in an invalid metadata client error regardless of the parameters I use.
I wrote code based on the documentation and code samples in other languages (I can't find any examples of doing this in Ruby with the current gem). I am using the google-apis-youtube_v3 gem, version 0.22.0.
Here is the relevant part of my code (assuming I have uploaded a video with id 'XYZ123'):
require 'googleauth'
require 'googleauth/stores/file_token_store'
require 'google-apis-youtube_v3'
def authorize [... auth code omitted ...] end
def get_service
service = Google::Apis::YoutubeV3::YouTubeService.new
service.key = API_KEY
service.client_options.application_name = APPLICATION_NAME
service.authorization = authorize
service
end
body = {
"snippet": {
"videoId": 'XYZ123',
"language": 'en',
"name": 'English'
}
}
s = get_service
s.insert_caption('snippet', body, upload_source: '/path/to/my-captions.vtt')
I have tried many different combinations, but the result is always the same:
Google::Apis::ClientError: invalidMetadata: The request contains invalid metadata values, which prevent the track from being created. Confirm that the request specifies valid values for the snippet.language, snippet.name, and snippet.videoId properties. The snippet.isDraft property can also be included, but it is not required. status_code: 400
It seems that there really is not much choice for the language and video ID values, and there is nothing remarkable about naming the captions as "English". I am really at a loss as to what could be wrong with the values I am passing in.
Incidentally, I get exactly the same response even if I just pass in nil as the body.
I looked at the OVERVIEW.md file included with the google-apis-youtube_v3 gem, and it referred to the Google simple REST client Usage Guide, which in turn mentions that most object properties do not use camel case (which is what the underlying JSON representation uses). Instead, in most cases properties must be sent using Ruby's "snake_case" convention.
Thus it turns out that the snippet should specify video_id and not videoId.
That seems to have let the request go through, so this resolves this issue.
The response I'm getting now has a status of "failed" and a failure reason of "processingFailed", but that may be the subject of another question if I can't figure it out.

LightGBM 'Using categorical_feature in Dataset.' Warning?

From my reading of the LightGBM document, one is supposed to define categorical features in the Dataset method. So I have the following code:
cats=['C1', 'C2']
d_train = lgb.Dataset(X, label=y, categorical_feature=cats)
However, I received the following error message:
/app/anaconda3/anaconda3/lib/python3.7/site-packages/lightgbm/basic.py:1243: UserWarning: Using categorical_feature in Dataset.
warnings.warn('Using categorical_feature in Dataset.')
Why did I get the warning message?
I presume that you get this warning in a call to lgb.train. This function also has argument categorical_feature, and its default value is 'auto', which means taking categorical columns from pandas.DataFrame (documentation). The warning, which is emitted at this line, indicates that, despite lgb.train has requested that categorical features be identified automatically, LightGBM will use the features specified in the dataset instead.
To avoid the warning, you can give the same argument categorical_feature to both lgb.Dataset and lgb.train. Alternatively, you can construct the dataset with categorical_feature=None and only specify the categorical features in lgb.train.
Like user andrey-popov described you can use the lgb.train's categorical_feature parameter to get rid of this warning.
Below is a simple example with some code how you could do it:
# Define categorical features
cat_feats = ['item_id', 'dept_id', 'store_id',
'cat_id', 'state_id', 'event_name_1',
'event_type_1', 'event_name_2', 'event_type_2']
...
# Define the datasets with the categorical_feature parameter
train_data = lgb.Dataset(X.loc[train_idx],
Y.loc[train_idx],
categorical_feature=cat_feats,
free_raw_data=False)
valid_data = lgb.Dataset(X.loc[valid_idx],
Y.loc[valid_idx],
categorical_feature=cat_feats,
free_raw_data=False)
# And train using the categorical_feature parameter
lgb.train(lgb_params,
train_data,
valid_sets=[valid_data],
verbose_eval=20,
categorical_feature=cat_feats,
num_boost_round=1200)
This is less of an answer to the original OP and more of an answer to people who are using sklearn API and encounter this issue.
For those of you who are using sklearn API, especially using one of the cross_val methods from sklearn, there are two solutions you could consider using.
Sklearn API solution
A solution that worked for me was to cast categorical fields into the category datatype in pandas.
If you are using pandas df, LightGBM should automatically treat those as categorical. From the documentation:
integer codes will be extracted from pandas categoricals in the
Python-package
It would make sense for this to be the equivalent in the sklearn API to setting categoricals in the Dataset object.
But keep in mind that LightGBM does not officially support virtually any of the non-core parameters for sklearn API, and they say so explicitly:
**kwargs is not supported in sklearn, it may cause unexpected issues.
Adaptive Solution
The other, more sure-fire solution to being able to use methods like cross_val_predict and such is to just create your own wrapper class that implements the core Dataset/Train under the hood but exposes a fit/predict interface for the cv methods to latch onto. That way you get the full functionality of lightGBM with only a little bit of rolling your own code.
The below sketches out what this could look like.
class LGBMSKLWrapper:
def __init__(self, categorical_variables, params):
self.categorical_variables = categorical_variables
self.params = params
self.model = None
def fit(self, X, y):
my_dataset = ltb.Dataset(X, y, categorical_feature=self.categorical_variables)
self.model = ltb.train(params=self.params, train_set=my_dataset)
def predict(self, X):
return self.model.predict(X)
The above lets you load up your parameters when you create the object, and then passes that onto train when the client calls fit.

How to use Ruby to send image to a deployed Sagemaker endpoint running a TensorFlow/Keras CNN?

I have trained a CNN using Tensorflow/Keras and successfully deployed it to Sagemaker using the saved_model format. It answers pings and the dashboard shows it is running.
I now need to be able to send it images and get back inferences. I have already successfully deployed an ANN to Sagemaker and gotten predictions back, so most of the "plumbing" is already working.
The Ruby performing the request is as follows:
def predict
sagemaker = Aws::SageMakerRuntime::Client.new(
access_key_id: Settings.sagemaker_key_id,
secret_access_key: Settings.sagemaker_secret,
region: Settings.sagemaker_aws_region
)
response = sagemaker.invoke_endpoint(endpoint_name: Settings.sagemaker_endpoint_name,
content_type: 'application/x-image',
body: File.open('developer/ai/caox_test_128.jpg', 'rb'))
return response[:body].string
end
(For now, I simply hardcoded a known file for testing.)
When I fire this, I get back this error: Aws::SageMakerRuntime::Errors::ModelError: Received client error (400) from model with message "{ "error": "JSON Parse error: Invalid value. at offset: 0" }"
It's almost as if the model is expecting more in the body than just the image, but I can't tell what. AWS's documentation has an example for Python using boto:
import boto3
import json
endpoint = '<insert name of your endpoint here>'
runtime = boto3.Session().client('sagemaker-runtime')
# Read image into memory
with open(image, 'rb') as f:
payload = f.read()
# Send image via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/x-image', Body=payload)
# Unpack response
result = json.loads(response['Body'].read().decode())
As far as I can tell, they are simply opening a file and sending it directly to sagemaker with no additional pre-processing. And, insofar as I can tell, I'm doing exactly what they are doing in Ruby, just using 'aws-sdk'.
I've looked through Amazon's documentation, and for examples on Google, but there is scant mention of doing anything special before sending the file, so I'm scratching my head.
What else do I need to consider when sending a file to a Sagemaker endpoint running a TensorFlow/Keras CNN to get it to respond with a prediction?

how can I get ALL records from route53?

how can I get ALL records from route53?
referring code snippet here, which seemed to work for someone, however not clear to me: https://github.com/aws/aws-sdk-ruby/issues/620
Trying to get all (I have about ~7000 records) via resource record sets but can't seem to get the pagination to work with list_resource_record_sets. Here's what I have:
route53 = Aws::Route53::Client.new
response = route53.list_resource_record_sets({
start_record_name: fqdn(name),
start_record_type: type,
max_items: 100, # fyi - aws api maximum is 100 so we'll need to page
})
response.last_page?
response = response.next_page until response.last_page?
I verified I'm hooked into right region, I see the record I'm trying to get (so I can delete later) in aws console, but can't seem to get it through the api. I used this: https://github.com/aws/aws-sdk-ruby/issues/620 as a starting point.
Any ideas on what I'm doing wrong? Or is there an easier way, perhaps another method in the api I'm not finding, for me to get just the record I need given the hosted_zone_id, type and name?
The issue you linked is for the Ruby AWS SDK v2, but the latest is v3. It also looks like things may have changed around a bit since 2014, as I'm not seeing the #next_page or #last_page? methods in the v2 API or the v3 API.
Consider using the #next_record_name and #next_record_type from the response when #is_truncated is true. That's more consistent with how other paginations work in the Ruby AWS SDK, such as with DynamoDB scans for example.
Something like the following should work (though I don't have an AWS account with records to test it out):
route53 = Aws::Route53::Client.new
hosted_zone = ? # Required field according to the API docs
next_name = fqdn(name)
next_type = type
loop do
response = route53.list_resource_record_sets(
hosted_zone_id: hosted_zone,
start_record_name: next_name,
start_record_type: next_type,
max_items: 100, # fyi - aws api maximum is 100 so we'll need to page
)
records = response.resource_record_sets
# Break here if you find the record you want
# Also break if we've run out of pages
break unless response.is_truncated
next_name = response.next_record_name
next_type = response.next_record_type
end

Avoid repeated calls to an API in Jekyll Ruby plugin

I have written a Jekyll plugin to display the number of pageviews on a page by calling the Google Analytics API using the garb gem. The only trouble with my approach is that it makes a call to the API for each page, slowing down build time and also potentially hitting the user call limits on the API.
It would be possible to return all the data in a single call and store it locally, and then look up the pageview count from each page, but my Jekyll/Ruby-fu isn't up to scratch. I do not know how to write the plugin to run once to get all the data and store it locally where my current function could then access it, rather than calling the API page by page.
Basically my code is written as a liquid block that can be put into my page layout:
class GoogleAnalytics < Liquid::Block
def initialize(tag_name, markup, tokens)
super # options that appear in block (between tag and endtag)
#options = markup # optional optionss passed in by opening tag
end
def render(context)
path = super
# Read in credentials and authenticate
cred = YAML.load_file("/home/cboettig/.garb_auth.yaml")
Garb::Session.api_key = cred[:api_key]
token = Garb::Session.login(cred[:username], cred[:password])
profile = Garb::Management::Profile.all.detect {|p| p.web_property_id == cred[:ua]}
# place query, customize to modify results
data = Exits.results(profile,
:filters => {:page_path.eql => path},
:start_date => Chronic.parse("2011-01-01"))
data.first.pageviews
end
Full version of my plugin is here
How can I move all the calls to the API to some other function and make sure jekyll runs that once at the start, and then adjust the tag above to read that local data?
EDIT Looks like this can be done with a Generator and writing the data to a file. See example on this branch Now I just need to figure out how to subset the results: https://github.com/Sija/garb/issues/22
To store the data, I had to:
Write a Generator class (see Jekyll wiki plugins) to call the API.
Convert data to a hash (for easy lookup by path, see 5):
result = Hash[data.collect{|row| [row.page_path, [row.exits, row.pageviews]]}]
Write the data hash to a JSON file.
Read in the data from the file in my existing Liquid block class.
Note that the block tag works from the _includes dir, while the generator works from the root directory.
Match the page path, easy once the data is converted to a hash:
result[path][1]
Code for the full plugin, showing how to create the generator and write files, etc, here
And thanks to Sija on GitHub for help on this.

Resources