How to structure container logs in Vertex AI? - google-ai-platform

I have a model in Vertex AI, from the logs it seems that Vertex AI has ingested the log into message field within jsonPayload field, but i would like to structure the jsonPayload field such that every key in message will be a field within jsonPayload, i.e: flatten/extract message

The logs in Stackdriver follow a defined LogEntry schema. Cloud Logging uses structured logs where log entries use the jsonPayload field to add structures to their payload.
For Vertex AI, the parameters are passed inside the message field which we see in the logs. These structures of the logs are predefined. However if you want to extract the fields that are present inside the message block you can refer to the below mentioned workarounds:
1. Create a sink :
You can export your logs to a Cloud Storage bucket, Bigquery,Pub/Sub etc.
If you use Bigquery as the sink, then in Bigquery you can use the JSON functions to extract the required data.
2. Download the logs and write your custom code :
You can download the log files and then write your custom logic to extract data as per your requirements.
You can refer to the client library (python) to write the custom logic and python JSON functions.

Using the gcloud logging client to write structure logs into a Vertex AI endpoint:
(Make sure you have a service account with premissions to write logs into gcloud, and, for clean logs make sure you don't stream any other logs into stderr or stdout)
import json
import logging
from logging import Handler, LogRecord
import google.cloud.logging_v2 as logging_v2
from google.api_core.client_options import ClientOptions
from google.oauth2 import service_account
data_to_write_to_endpoint = {key1: value1, ...}
#Json key for a Service account permitted to write logs into the gcp
# project where your endpoint is
credentials = service_account.Credentials.from_service_account_info(
json.loads(SERVICE_ACOUNT_KEY_JSON)
)
client = logging_v2.client.Client(
credentials=credentials, client_options=ClientOptions(api_endpoint="logging.googleapis.com",),
)
# This represent your Vertex AI endpoint
resource = logging_v2.Resource(
type="aiplatform.googleapis.com/Endpoint",
labels={"endpoint_id": YOUR_ENDPOINT_ID, "location": ENDPOINT_REGION},
)
logger = client.logger("LOGGER NAME")
logger.log_struct(
info=data_to_write_to_endpoint,
severity=severity,
resource=resource,
)

Related

Configure subnetwork for vertex ai pipeline component

I have a vertex ai pipeline component that needs to connect to a database. This database exists in a VPC network. Currently my component is failing because it is not able to connect to the database, but I believe I can get it to work if I can configure the component to use the subnetwork.
How do I configure the workerPoolSpecs of the component to use the subnetwork?
I was hoping I could do something like that:
preprocess_data_op = component_store.load_component('org/ml_engine/preprocess')
#dsl.pipeline(name="test-pipeline-vertex-ai")
def pipeline(project_id: str, some_param: str):
preprocess_data_op(
project_id=project_id,
my_param=some_param,
subnetwork_uri="projects/xxxxxxxxx/global/networks/data",
).set_display_name("Preprocess data")
However the param is not there, and i get
TypeError: Preprocess() got an unexpected keyword argument 'subnetwork_uri'
How do I define the subnetwork for the component?
From Google docs, There is no mention of how you can run a specific component on a subnetwork.
However, it is possible to run the entire pipeline in a subnetwork by passing in the subnetwork as part of the job submit api.
job.submit(service_account=SERVICE_ACCOUNT, network=NETWORK)

Golang logging JSON in CloudWatch without added Backslashes

Golang vesion: 1.18.3
Logging libs used:
"log"
"github.com/kdar/logrus-cloudwatchlogs"
"github.com/sirupsen/logrus"
I'm writing a AWS Lambda using Go. It sits behind an AWS APIGateway as a REST API.
I'm trying to log the requests got into it and the responses it sends back.
Let's say the example request payload is:
const jsonData = `{"name": "Yomiko","address": {"city": "Tokyo","street": "Shibaura St"},"children":[{"lastName": "Takayashi"}],"isEmployed": false}`
When I log it using standard "log" library as below,
log.Println("JSON Data: ", jsonData)
It's printed nicely in the targeted CloudWatch Stream maintaining the JSON format as below,
Image 1: CloudWatch log with standard "log" library
However, when I used logrus with logrus-cloudwatchlogs as below,
hook, err := logrus_cloudwatchlogs.NewHook(transactionLogGroup, transactionLogStream, sess)
l := logrus.New()
l.Hooks.Add(hook)
l.Out = ioutil.Discard
l.Formatter = &logrus.JSONFormatter{}
l.Println("JSON Data: ", jsonData)
It adds backslashes as shown below and loses the nice JSON Format,
Image 2: CloudWatch log with "logrus" library
My question for you good people is,
How can I persevere the nice JSON format as shown in Image 1 when I use the "logrus" library? Please note that using standard "log" library is not an option here.

How to set OpenSearch/Elasticsearch as the destination of a Kinesis Firehose?

I am trying to create Data Stream -> Firehose -> OpenSearch infrastructure using the AWS CDK v2. I was surprised to find that, although OpenSearch is a supported Firehose destination, there is nothing in the CDK to support this use case.
In my CDK Stack I have created an OpenSearch Domain, and am trying to create a Kinesis Firehose DeliveryStream with that domain as the destination. However, kinesisfirehose-destinations package seems to only have a ready-to-use destination for S3 buckets, so there is no obvious way to do this easily using only the constructs supplied by the aws-cdk, not even using the alpha packages.
I think I should be able to write an OpenSearch destination construct by implementing IDestination. I have tried the following simplistic implementation:
import {Construct} from "constructs"
import * as firehose from "#aws-cdk/aws-kinesisfirehose-alpha"
import {aws_opensearchservice as opensearch} from "aws-cdk-lib"
export class OpenSearchDomainDestination implements firehose.IDestination {
private readonly dest: opensearch.Domain
constructor(dest: opensearch.Domain) {
this.dest = dest
}
bind(scope: Construct, options: firehose.DestinationBindOptions): firehose.DestinationConfig {
return {dependables: [this.dest]}
}
}
then I can use it like so,
export class MyStack extends Stack {
...
private createFirehose(input: kinesis.Stream, output: opensearch.Domain) {
const destination = new OpenSearchDomainDestination(output)
const deliveryStream = new firehose.DeliveryStream(this, "FirehoseDeliveryStream", {
destinations: [destination],
sourceStream: input,
})
input.grantRead(deliveryStream)
output.grantWrite(deliveryStream)
}
}
This will compile and cdk synth runs just fine. However, I get the following error when running cdk deploy:
CREATE_FAILED | AWS::KinesisFirehose::DeliveryStream | ... Resource handler returned message: "Exactly one destination configuration is supported for a Firehose
I'm not sure I understand this message but it seems to imply that it will reject outright everything except the one provided S3 bucket destination.
So, my titular question could be answered by the answer to either of these two questions:
How are you supposed to implement bind in IDestination?
Are there any complete working examples of creating a Firehose to OpenSearch using the non-alpha L1 constructs?
(FYI I have also asked this question on the AWS forum but have not yet received an answer.)
Other destinations (at the moment) than S3 are not supported by the L2 constructs. This is described at https://docs.aws.amazon.com/cdk/api/v1/docs/aws-kinesisfirehose-readme.html
In such cases, I go to the source code to see what can be done. See https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-kinesisfirehose/lib/destination.ts . There is no easy way how to inject other destination than S3 since the DestinationConfig does not support it. You can see at https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-kinesisfirehose-destinations/lib/s3-bucket.ts how the config for S3 is crafted. And you can see how that config is used to translate to L1 construct CfnDeliveryStream at https://github.com/aws/aws-cdk/blob/f82d96bfed427f8e49910ac7c77004765b2f5f6c/packages/%40aws-cdk/aws-kinesisfirehose/lib/delivery-stream.ts#L364
Probably easiest way at the moment is to write down your L1 constructs to define destination as OpenSearch.

Can I record workout data to GoogleFit from WearOS app?

I'm making a fitness wearOS app.
I want to record workouts completed with the app to GoogleFit.
Is there a way to do that from WearOS?
I start a workout using HealthServices:
suspend fun startExercise() {
val dataTypes = setOf(
DataType.HEART_RATE_BPM,
DataType.LOCATION
)
val aggregateDataTypes = setOf(
DataType.DISTANCE,
DataType.TOTAL_CALORIES
)
val config = ExerciseConfig.builder()
.setExerciseType(ExerciseType.RUNNING)
.setDataTypes(dataTypes)
.setAggregateDataTypes(aggregateDataTypes)
.setShouldEnableAutoPauseAndResume(false)
.setShouldEnableGps(true)
.build()
HealthServices.getClient(this /*context*/)
.exerciseClient
.startExercise(config)
.await()
}
(Code is from this example https://developer.android.com/training/wearables/health-services/active#start)
I was expecting if I would start/end a workout with HealthServices it would auto-magically sync the data to GoogleFit(Apple does this with for HealthKit).
So, can I record workout data to GoogleFit from a WearOS app?
To add to the Yuri's comment that this isn't possible automatically, the SessionClient is probably what you'd want in order to do this manually. The flow would be:
Collect data with Health Services
Transform
Insert session with SessionClient
The insert a session snippet in the Google Fit docs is a relevant example, as it both sets the session type (in this case, running) and then also adds the underlying data (instead of doing that separately with HistoryClient).
Update: You may also wish to take a look at Health Connect, which was recently announced.

How to get GCP Audit Log status programmatically

I'm trying to get a list of Audit Logs similar to what is displayed in Google console page (IAM/Audit Logs) using the Golang API GetIamPolicy as described here:
https://cloud.google.com/resource-manager/reference/rest/v1/projects/getIamPolicy
If one service has at least one of its Log Types set (Data Read, Data Write or Admin Read), GetIamPolicy will return it, but if it does not have any set then the service is omitted from the response.
As an example, if my project has three services A, B and C and A has Data Read enabled, B has Admin Read enabled and C doesn't have anything enabled, GetIamPolicy will only return A and B.
GetIamPolicyRequest struct seems to have fields designed for this scenario (NullFields and ForceSendFields), but I couldn't make it work. Example:
rb := &cloudresourcemanager.GetIamPolicyRequest{}
rb.ForceSendFields = []string{"LogType"}
rb.NullFields = []string{"LogType"}
policyOptions := &cloudresourcemanager.GetPolicyOptions{}
policyOptions.ForceSendFields = []string{"LogType"}
policyOptions.NullFields = []string{"LogType"}
policyOptions.RequestedPolicyVersion = 3
rb.Options = policyOptions
Any ideas on how to retrieve the missing services?

Resources