Configure subnetwork for vertex ai pipeline component - google-cloud-vertex-ai

I have a vertex ai pipeline component that needs to connect to a database. This database exists in a VPC network. Currently my component is failing because it is not able to connect to the database, but I believe I can get it to work if I can configure the component to use the subnetwork.
How do I configure the workerPoolSpecs of the component to use the subnetwork?
I was hoping I could do something like that:
preprocess_data_op = component_store.load_component('org/ml_engine/preprocess')
#dsl.pipeline(name="test-pipeline-vertex-ai")
def pipeline(project_id: str, some_param: str):
preprocess_data_op(
project_id=project_id,
my_param=some_param,
subnetwork_uri="projects/xxxxxxxxx/global/networks/data",
).set_display_name("Preprocess data")
However the param is not there, and i get
TypeError: Preprocess() got an unexpected keyword argument 'subnetwork_uri'
How do I define the subnetwork for the component?

From Google docs, There is no mention of how you can run a specific component on a subnetwork.
However, it is possible to run the entire pipeline in a subnetwork by passing in the subnetwork as part of the job submit api.
job.submit(service_account=SERVICE_ACCOUNT, network=NETWORK)

Related

Migrate kubeflow docker image components to VertexAI pipeline

I am trying to migrate a custom component created in kubeflow to VertexAI.
In Kubeflow I used to create components as docker container images and then load them into my pipeline as follows:
def my_custom_component_op(gcs_dataset_path: str, some_param: str):
return kfp.dsl.ContainerOp(
name='My Custom Component Step',
image='gcr.io/my-project-23r2/my-custom-component:latest',
arguments=["--gcs_dataset_path", gcs_dataset_path,
'--component_param', some_param],
file_outputs={
'output': '/app/output.csv',
}
)
I would then use them in the pipeline as follows:
#kfp.dsl.pipeline(
name='My custom pipeline',
description='The custom pipeline'
)
def generic_pipeline(project_id, some_param):
output_component = my_custom_component_op(
gcs_dataset_path=gcs_dataset_path,
some_param=some_param
)
output_next_op = next_op(gcs_dataset_path=dsl.InputArgumentPath(
output_component.outputs['output']),
next_op_param="some other param"
)
Can I reuse the same component docker image from kubeflow v1 in vertex ai pipeline? How can I do that? hopefully without changing anything in the component itself.
I have found examples online of vertex AI pipelines that uses the #component decorator as follows:
#component(base_image=PYTHON37, packages_to_install=[PANDAS])
def my_component_op(
gcs_dataset_path: str,
some_param: str
dataset: Output[Dataset],
):
...perform some op....
But this would require me to copy paste the docker code in my pipeline and this is not really something I want to do. Is there a way to re-use the docker image and passing the parameters? I couldn't find any example of that anywhere.
You need to prepare component yaml and load it with load_component_from_file.
It's well documented on kfp v2 Kubeflow documentation page, it's also written here.

Nesting custom resources in chef

I am trying to build a custom resource which would in turn use another of my custom resource as part of its action. The pseudo-code would look something like this
customResource A
property component_id String
action: doSomething do
component_id = 1 if component_id.nil?
node.default[component_details][component_id] = ''
customResource_b "Get me component details" do
comp_id component_id
action :get_component_details
end
Chef::log.info("See the output computed by my customResourceB")
Chef::log.info(node[component_details][component_id])
end
Thing to note:
1. The role of customResource_b is to make a PS call to a REST web service and store the JSON result in node[component_details][component_id] overriding its value. I am creating this attribute node on this resource since I know it will be used later one, hence avoiding compile time issues.
Issues I am facing:
1. When testing a simple recipe that calls this resource in chef-client, the code in the resource gets executed to the last log line and after that the call to customResource_b is made. Which is something I am not expecting to happen.
Any advice would be appreciated. I am also quite new to Chef so any design improvements are also welcome
there is no need to nest chef resources, rather use chef idompotance, guards and notification.
and as usualy, you can always use a condition to decide which cookbook\recipe to run.

How to use kubebuilder's client.List method?

I'm working on a custom controller for a custom resource using kubebuilder (version 1.0.8). I have a scenario where I need to get a list of all the instances of my custom resource so I can sync up with an external database.
All the examples I've seen for kubernetes controllers use either client-go or just call the api server directly over http. However, kubebuilder has also given me this client.Client object to get and list resources. So I'm trying to use that.
After creating a client instance by using the passed in Manager instance (i.e. do mgr.GetClient()), I then tried to write some code to get the list of all the Environment resources I created.
func syncClusterWithDatabase(c client.Client, db *dynamodb.DynamoDB) {
// Sync environments
// Step 1 - read all the environments the cluster knows about
clusterEnvironments := &cdsv1alpha1.EnvironmentList{}
c.List(context.Background(), /* what do I put here? */, clusterEnvironments)
}
The example in the documentation for the List method shows:
c.List(context.Background, &result);
which doesn't even compile.
I saw a few method in the client package to limit the search to particular labels, or for a specific field with a specific value, but nothing to limit the result to a specific resource kind.
Is there a way to do this via the Client object? Should I do something else entirely?
So figured it out - the answer is to pass nil for the second parameter. The type of the output pointer determines which sort of resource it actually retrieves.
According to the latest documentation, the List method is defined as follows,
List(ctx context.Context, list ObjectList, opts ...ListOption) error
If the List method you are calling has the same definition as above, your code should compile. As it has variadic options to set the namespace and field match, the mandatory arguments are Context and objectList.
Ref: KubeBuilder Book

Extract Returned values from 'VirtualNetworkPeering object

How Can I extract the atrribute values returned from list of peered virtual networks.
I executed this command and I need to extract the Network ID
list_all = network_client.virtual_network_peerings.list(
GROUP_NAME,
VNET_NAME
)
for peer in list_all:
print(peer)
and I get this value for from the print above:
{'additional_properties': {'type': 'Microsoft.Network/virtualNetworks/virtualNetworkPeerings'},
'id': '/subscriptions/c70b9b-efd6-497d-98d8-e1e1d497425/resourceGroups/azure-sample-group-virtual-machines/providers/Microsoft.Network/virtualNetworks/azure-sample-vnet/virtualNetworkPeerings/sample-vnetpeer',
'allow_virtual_network_access': True,
'allow_forwarded_traffic': True,
'allow_gateway_transit': False,
'use_remote_gateways': False,
'remote_virtual_network': <azure.mgmt.network.v2018_08_01.models.sub_resource_py3.SubResource object at 0x048D6950>,
'remote_address_space': <azure.mgmt.network.v2018_08_01.models.address_space_py3.AddressSpace object at 0x048D68D0>,
'peering_state': 'Initiated',
'provisioning_state': 'Succeeded',
'name': 'sample-vnetpeer',
'etag': 'W/"653f7f94-3c4e-4275-bfdf-0bbbd9beb6e4"'}
How can I get this value "remote_virtual_network"?
My feeling is that your question is actually more a Python question than an Azure question. Assuming in your application this field is set with values, then remote_virtual_network is a SubResource meaning it only has one attribute: id
for peer in list_all:
remote_virtual_network_id = peer.remote_virtual_network.id
This guy is an actual virtual network, so if you want details about it you need to get it with network_client.virtual_networks.get:
https://learn.microsoft.com/en-us/python/api/azure-mgmt-network/azure.mgmt.network.v2018_08_01.operations.virtualnetworksoperations?view=azure-python#get
The tricky part is you get an ID, but VNet get asks for a RG name and VNet name, you can use the ARM ID parser for that:
https://learn.microsoft.com/en-us/python/api/msrestazure/msrestazure.tools?view=azure-python#parse-resource-id
'remote_virtual_network': ,
'remote_address_space':
I will try this out and get back:
This command is analogous to "get-azurermvirtualnetworkpeering -ResourceGroupName -VirtualNetworkName -Name" on Azure PowerShell.
remote_virtual_network you don't have one. You will only get this if you have Remote Gateway enabled and the Peer will learn the IP address of the Remote (On-premise site) that you are trying to connect.
To get this value, deploy a gateway in the Vnet and connect it to Say "Vnet-S2S-test" with a gateway deployed there as well.
Once, the Site-to-site between the Vnets are up. You can execute the command and you should see those fields populated with the local network gateway details.

How do I set an alarm to terminate an EC2 instance using boto?

I have been unable to find a simple example which shows me how to use boto to terminate an Amazon EC2 instance using an alarm (without using AutoScaling). I want to terminate the specific instance that has a CPU usage less than 1% for 10 minutes.
Here is what I've tried so far:
import boto.ec2
import boto.ec2.cloudwatch
from boto.ec2.cloudwatch import MetricAlarm
conn = boto.ec2.connect_to_region("us-east-1", aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
cw = boto.ec2.cloudwatch.connect_to_region("us-east-1", aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY)
reservations = conn.get_all_instances()
for r in reservations:
for inst in r.instances:
alarm = boto.ec2.cloudwatch.MetricAlarm(name='TestAlarm', description='This is a test alarm.', namespace='AWS/EC2', metric='CPUUtilization', statistic='Average', comparison='<=', threshold=1, period=300, evaluation_periods=2, dimensions={'InstanceId':[inst.id]}, alarm_actions=['arn:aws:automate:us-east-1:ec2:terminate'])
cw.put_metric_alarm(alarm)
Unfortunately it gives me this error:
dimensions={'InstanceId':[inst.id]}, alarm_actions=['arn:aws:automate:us-east-1:ec2:terminate'])
TypeError: init() got an unexpected keyword argument 'alarm_actions'
I'm sure it's something simple I'm missing.
Also, I am not using CloudFormation, so I cannot use the AutoScaling feature. This is because I don't want the alarm to use a metric across the entire group, rather only for a specific instance, and only terminate that specific instance (not any instance in that group).
Thanks in advance for your help!
The alarm actions are not passed through dimensions but rather added as an attribute to the MetricAlarm object that you are using. In your code you need to do the following:
alarm = boto.ec2.cloudwatch.MetricAlarm(name='TestAlarm', description='This is a test alarm.', namespace='AWS/EC2', metric='CPUUtilization', statistic='Average', comparison='<=', threshold=1, period=300, evaluation_periods=2, dimensions={'InstanceId':[inst.id]})
alarm.add_alarm_action('arn:aws:automate:us-east-1:ec2:terminate')
cw.put_metric_alarm(alarm)
You can also see in the boto documentation here:
http://docs.pythonboto.org/en/latest/ref/cloudwatch.html#module-boto.ec2.cloudwatch.alarm

Resources