Add tag while creating EBS snapshot using boto3 - amazon-ec2

Is it possible to add a tag when invoking the create_snapshot() method in boto3? When I run the following code:
client = boto3.client('ec2')
root_snap_resp = client.create_snapshot(
Description='My snapshot description',
VolumeId='vol-123456',
Tags=[{'Key': 'Test_Key', 'Value': 'Test_Value'}]
)
I get the following error:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in input: "Tags", must be one of: DryRun, VolumeId, Description
Is the only way to add a tag after the fact using the create_tags() method?

In April, 2018, the original answer (and the question itself) were made obsolete...
You can now specify tags for EBS snapshots as part of the API call that creates the resource or via the Amazon EC2 Console when creating an EBS snapshot.
https://aws.amazon.com/blogs/compute/tag-amazon-ebs-snapshots-on-creation-and-implement-stronger-security-policies/
...unless you are using an older version of an SDK that does not implement the feature.
The same announcement extended resource-level permissions to snapshots.
The underlying CreateSnapshot action in the EC2 API doesn't have any provision for adding tags simultaneously with the creation of the snapshot. You have to go back and tag it after creating it.

ec2 = boto3.resource('ec2')
volume = ec2.Volume('vol-xxxxxxxxxx')
snapshot = ec2.create_snapshot(
VolumeId = volume.id,
TagSpecifications = [
{
'ResourceType': 'snapshot',
'Tags': volume.tags,
},
],
Description = 'Snapshot of volume ({})'.format(volume.id),
)
#fender4645 You can now specify tags for EBS snapshots as part of the API call that creates the resource.

Have a look at my backup script:
import boto3
import collections
import datetime
ec = boto3.client('ec2')
def lambda_handler(event, context):
reservations = ec.describe_instances(
Filters=[
{'Name':'tag:Backup', 'Values':['Yes','yes']}
]
).get(
'Reservations', []
)
instances = sum(
[
[i for i in r['Instances']]
for r in reservations
], [])
print "Found %d instances that need backing up" % len(instances)
to_tag = collections.defaultdict(list)
for instance in instances:
try:
retention_days = [
int(t.get('Value')) for t in instance['Tags']
if t['Key'] == 'Retention'][0]
except IndexError:
retention_days = 30
for dev in instance['BlockDeviceMappings']:
if dev.get('Ebs', None) is None:
continue
vol_id = dev['Ebs']['VolumeId']
print "Found EBS volume %s on instance %s" % (
vol_id, instance['InstanceId'])
snap = ec.create_snapshot(
VolumeId=vol_id,
)
to_tag[retention_days].append(snap['SnapshotId'])
print "Retaining snapshot %s of volume %s from instance %s for %d days" % (
snap['SnapshotId'],
vol_id,
instance['InstanceId'],
retention_days,
)
snapshot_name = 'N/A'
if 'Tags' in instance:
for tags in instance['Tags']:
if tags["Key"] == 'Name':
snapshot_name = tags["Value"]
print "Tagging snapshot with Name: %s" % (snapshot_name)
ec.create_tags(
Resources=[
snap['SnapshotId'],
],
Tags=[
{'Key': 'Name', 'Value': snapshot_name},
{'Key': 'Description', 'Value': "Created by lambda automated backups"}
]
)
for retention_days in to_tag.keys():
delete_date = datetime.date.today() + datetime.timedelta(days=retention_days)
delete_fmt = delete_date.strftime('%Y-%m-%d')
print "Will delete %d snapshots on %s" % (len(to_tag[retention_days]), delete_fmt)
ec.create_tags(
Resources=to_tag[retention_days],
Tags=[
{'Key': 'DeleteOn', 'Value': delete_fmt}
]
)
And this this my script to delete old backups that have "delete_on" tag with value of today in the YYYY-MM-DD format
import boto3
import re
import datetime
ec = boto3.client('ec2')
iam = boto3.client('iam')
"""
This function looks at *all* snapshots that have a "DeleteOn" tag containing
the current day formatted as YYYY-MM-DD. This function should be run at least
daily.
"""
def lambda_handler(event, context):
account_ids = list()
try:
"""
You can replace this try/except by filling in `account_ids` yourself.
Get your account ID with:
> import boto3
> iam = boto3.client('iam')
> print iam.get_user()['User']['Arn'].split(':')[4]
"""
iam.get_user()
except Exception as e:
# use the exception message to get the account ID the function executes under
account_ids.append(re.search(r'(arn:aws:sts::)([0-9]+)', str(e)).groups()[1])
delete_on = datetime.date.today().strftime('%Y-%m-%d')
filters = [
{'Name': 'tag-key', 'Values': ['DeleteOn']},
{'Name': 'tag-value', 'Values': [delete_on]},
]
snapshot_response = ec.describe_snapshots(OwnerIds=account_ids, Filters=filters)
for snap in snapshot_response['Snapshots']:
print "Deleting snapshot %s" % snap['SnapshotId']
ec.delete_snapshot(SnapshotId=snap['SnapshotId'])

Related

boto3 and lambda: Invalid type for parameter KeyConditionExpression when using DynamoDB resource and localstack

I'm having very inconsistent results when trying to use boto3 dynamodb resources from my local machine vs from within a lambda function in localstack. I have the following simple lambda handler, that just queries a table based on the Hash Key:
import boto3
from boto3.dynamodb.conditions import Key
def handler(event, context):
dynamodb = boto3.resource(
"dynamodb", endpoint_url=os.environ["AWS_EP"]
)
table = dynamodb.Table("precalculated_scores")
items = table.query(
KeyConditionExpression=Key("customer_id").eq(event["customer_id"])
)
return items
The environment variable "AWS_EP" is set to my localstack DNS when protyping (http://localstack:4566).
When I call this lamdba I get the following error:
{
"errorMessage": "Parameter validation failed:\nInvalid type for parameter KeyConditionExpression, value: <boto3.dynamodb.conditions.Equals object at 0x7f7440201960>, type: <class 'boto3.dynamodb.conditions.Equals'>, valid types: <class 'str'>",
"errorType": "ParamValidationError",
"stackTrace": [
" File \"/opt/code/localstack/localstack/services/awslambda/lambda_executors.py\", line 1423, in do_execute\n execute_result = lambda_function_callable(inv_context.event, context)\n",
" File \"/opt/code/localstack/localstack/services/awslambda/lambda_api.py\", line 782, in exec_local_python\n return inner_handler(event, context)\n",
" File \"/var/lib/localstack/tmp/lambda_script_l_dbef16b3.py\", line 29, in handler\n items = table.query(\n",
" File \"/opt/code/localstack/.venv/lib/python3.10/site-packages/boto3/resources/factory.py\", line 580, in do_action\n response = action(self, *args, **kwargs)\n",
" File \"/opt/code/localstack/.venv/lib/python3.10/site-packages/boto3/resources/action.py\", line 88, in __call__\n response = getattr(parent.meta.client, operation_name)(*args, **params)\n",
" File \"/opt/code/localstack/.venv/lib/python3.10/site-packages/botocore/client.py\", line 514, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/opt/code/localstack/.venv/lib/python3.10/site-packages/botocore/client.py\", line 901, in _make_api_call\n request_dict = self._convert_to_request_dict(\n",
" File \"/opt/code/localstack/.venv/lib/python3.10/site-packages/botocore/client.py\", line 962, in _convert_to_request_dict\n request_dict = self._serializer.serialize_to_request(\n",
" File \"/opt/code/localstack/.venv/lib/python3.10/site-packages/botocore/validate.py\", line 381, in serialize_to_request\n raise ParamValidationError(report=report.generate_report())\n"
]
}
Which is a weird error - From what I researched on other question it usually happens when using the boto3 client, but I am using boto3 resources. Furthermore, when I run the code locally in my machine it runs fine.
At first I thought that it might be due to different versions for boto3 (My local machine is using version 1.24.96, while the version inside the lambda runtime is 1.16.31). However I downgraded my local version to the same as the one in the runtime, and I keep getting the same results.
After some answers on this question I managed to get the code working against actual AWS services, but it still won't work when running against localstack.
Am I doing anything wrong? Os might this be a bug with localstack?
--- Update 1 ---
Changing the return didn't solve the problem:
return {"statusCode": 200, "body": json.dumps(items)}
--- Update 2 ---
The code works when running against actual AWS services instead of running against localstack. Updating the question with this information.
This works fine from both my local machine and Lambda:
import json, boto3
from boto3.dynamodb.conditions import Key
def lambda_handler(event, context):
dynamodb = boto3.resource(
"dynamodb",
endpoint_url="https://dynamodb.eu-west-1.amazonaws.com"
)
table = dynamodb.Table("test")
items = table.query(
KeyConditionExpression=Key("pk").eq("1")
)
print(items)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
Also be sure that event["customer_id"] is in fact a string value as expected by the eq function.
I would check to ensure you have the endpoint setup correctly and that you have the current version deployed.
It may also be the fact you are trying to return the results of your API call via the handler, instead of a proper JSON response as expected:
return {
'statusCode': 200,
'body': json.dumps(items)
}

Vertex AI Pipelines (Kubeflow) skip step with dependent outputs on later step

I’m trying to run a Vertex AI Pipelines job where I skip a certain pipeline step if the value of a certain pipeline parameter (in this case do_task1) is False. But because there is another step that runs unconditionally and expects the output of the first potentially skipped step, I get the following error, independently of do_task1 being True or False:
AssertionError: component_input_artifact: pipelineparam--task1-output_path not found. All inputs: parameters {
key: "do_task1"
value {
type: STRING
}
}
parameters {
key: "task1_name"
value {
type: STRING
}
}
It seems like the compiler just cannot find the output output_path from task1. So I wonder if there is any way to have some sort of placeholders for the outputs of those steps that are under a dsl.Condition , and thus they get filled with default values unless the actual steps run and fill them with the non-default values.
The code below represents the problem and is easily reproducible.
I'm using google-cloud-aiplatform==1.14.0 and kfp==1.8.11
from typing import NamedTuple
from kfp import dsl
from kfp.v2.dsl import Dataset, Input, OutputPath, component
from kfp.v2 import compiler
from google.cloud.aiplatform import pipeline_jobs
#component(
base_image="python:3.9",
packages_to_install=["pandas"]
)
def task1(
# inputs
task1_name: str,
# outputs
output_path: OutputPath("Dataset"),
) -> NamedTuple("Outputs", [("output_1", str), ("output_2", int)]):
import pandas as pd
output_1 = task1_name + "-processed"
output_2 = 2
df_output_1 = pd.DataFrame({"output_1": [output_1]})
df_output_1.to_csv(output_path, index=False)
return (output_1, output_2)
#component(
base_image="python:3.9",
packages_to_install=["pandas"]
)
def task2(
# inputs
task1_output: Input[Dataset],
) -> str:
import pandas as pd
task1_input = pd.read_csv(task1_output.path).values[0][0]
return task1_input
#dsl.pipeline(
pipeline_root='pipeline_root',
name='pipelinename',
)
def pipeline(
do_task1: bool,
task1_name: str,
):
with dsl.Condition(do_task1 == True):
task1_op = (
task1(
task1_name=task1_name,
)
)
task2_op = (
task2(
task1_output=task1_op.outputs["output_path"],
)
)
if __name__ == '__main__':
do_task1 = True # <------------ The variable to modify ---------------
# compile pipeline
compiler.Compiler().compile(
pipeline_func=pipeline, package_path='pipeline.json')
# create pipeline run
pipeline_run = pipeline_jobs.PipelineJob(
display_name='pipeline-display-name',
pipeline_root='pipelineroot',
job_id='pipeline-job-id',
template_path='pipelinename.json',
parameter_values={
'do_task1': do_task1, # pipeline compilation fails with either True or False values
'task1_name': 'Task 1',
},
enable_caching=False
)
# execute pipeline run
pipeline_run.run()
Any help is much appreciated!
The real issue here is with dsl.Condition(): creates a sub group, where task1_op is an inner task only "visible" from within the sub group. In the latest SDK, it will throw a more explicit error message saying task2 cannot depends on any inner task.
So to resolve the issue, you just need to move task2 to be within the condition--if condition was not met, you don't have a valid input to feed into task2 anyway.
with dsl.Condition(do_task1 == True):
task1_op = (
task1(
task1_name=task1_name,
)
)
task2_op = (
task2(
task1_output=task1_op.outputs["output_path"],
)
)

Pass Ansible variables into custom Ansible module

I have a custom module that resides in the library/ directory of my Ansible role. I can call the module from within my playbook, and the code executes correctly, but only if the values it expects are hardcoded in the module code itself. How can I pass values to the module from the playbook?
I've tried the following:
- name: Create repo and use specific KMS key
ecr_kms:
repositoryName: "new-ecr-repo"
encryptionConfiguration.kmsKey: 'my-kms-key-id"
and
- name: Create repo and use specific KMS key
ecr_kms:
repositoryName: "{{ repo_name }}"
encryptionConfiguration.kmsKey: "{{ kms_key_id }}"
Which I would expect to work, but neither does and, I get the following errors:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid length for parameter repositoryName, value: 0, valid min length: 2
Invalid length for parameter encryptionConfiguration.kmsKey, value: 0, valid min length: 1
The service module I'm trying to use
The code of the custom module:
#!/usr/bin/python
from urllib import response
import boto3
from jinja2 import Template
from ansible.module_utils.basic import AnsibleModule
def create_repo():
client = boto3.client('ecr')
response = client.create_repository(
#registryId='',
repositoryName='',
imageTagMutability='IMMUTABLE',
imageScanningConfiguration={
'scanOnPush': True
},
encryptionConfiguration={
'encryptionType': 'KMS',
'kmsKey': ""
}
)
def main():
create_repo()
if __name__ in '__main__':
main()
You do need to make your module aware of the arguments you want it to accept, so, in your main function:
#!/usr/bin/env python
from ansible.module_utils.basic import AnsibleModule
def create_repo(repositoryName, kmsKey):
# Call to the API comes here
def main():
module = AnsibleModule(
argument_spec = dict(
repositoryName = dict(type = 'str', required = True),
kmsKey = dict(type = 'str', required = True),
)
)
params = module.params
create_repo(
params['repositoryName'],
params['kmsKey']
)
if __name__ == '__main__':
main()
More can be found in the relevant documentation: Argument spec.
With this, your taks would be:
- name: Create repo and use specific KMS key
ecr_kms:
repositoryName: "{{ repo_name }}"
kmsKey: "{{ kms_key_id }}"
PS, word of advice: avoid using a dot in a YAML key, that would just be making your life complicated for no actual good reason.

Netmiko / textfsm

'Hello, i got my information parsed the way I want it. But now I'm trying to save the output
to a possible .txt file. Im not sure what to type in the "backup.write()" if I type the
"output" variable it saves the whole output not the parsed section.'
connection = ConnectHandler(**cisco_device)
# print('Entering the enable mode...')
# connection.enable()
prompt = connection.find_prompt()
hostname = prompt[0:-1]
print(hostname)
output = connection.send_command('show interfaces status', use_textfsm=True)
for interface in output:
if interface['status'] == 'notconnect':
print(f"interface {interface['port']} \n shutdown")
print(hostname)
print('*' * 85)
# minute = now.minute
now = datetime.now()
year = now.year
month = now.month
day = now.day
hour = now.hour
# creating the backup filename (hostname_date_backup.txt)
filename = f'{hostname}_{month}-{day}-{year}_backup.txt'
# writing the backup to the file
with open(filename, 'w') as backup:
backup.write()
print(f'Backup of {hostname} completed successfully')
print('#' * 30)
print('Closing connection')
connection.disconnect()
my desired result is to run the Cisco IOS command "show interface status" and parse the data using textfsm module to only provide the interfaces that are in the shtudown.
I tried the same on show ip interface brief, because I have no access to a Cisco switch right now. For show interfaces status both methods apply but with different output modifier or if condition.
So to get the following output, you can do it in two ways:
1- CLI Output Modifier
show ip interface brief | include down
And the rest is left for TextFSM to parse the output
[{'intf': 'GigabitEthernet2',
'ipaddr': 'unassigned',
'proto': 'down',
'status': 'administratively down'},
{'intf': 'GigabitEthernet3',
'ipaddr': '100.1.1.1',
'proto': 'down',
'status': 'down'}]
2- Python
You can get the whole output from show ip interface brief and loop over all parsed interfaces and set an if condition to get the down interfaces only. (Recommended)
# Condition for `show ip interface brief`
down = [
intf
for intf in intfs
if intf["proto"] == "down" or intf["status"] in ("down", "administratively down")
]
# Condition for `show interfaces status`
down = [
intf
for intf in intfs
if intf["status"] == "notconnect"
]
Exporting a List[Dict] to a .txt file makes no sense. You don't have any syntax highlighting or formatting in .txt files. It's better to export it to a JSON file. So a complete example of what you want to achieve can be something like:
import json
from datetime import date
from netmiko import ConnectHandler
device = {
"device_type": "cisco_ios",
"ip": "x.x.x.x",
"username": "xxxx",
"password": "xxxx",
"secret": "xxxx",
}
with ConnectHandler(**device) as conn:
print(f'Connected to {device["ip"]}')
if not conn.check_enable_mode():
conn.enable()
hostname = conn.find_prompt()[:-1]
intfs = conn.send_command(
command_string="show ip interface brief", use_textfsm=True
)
print("Connection Terminated")
down = [
intf
for intf in intfs
if intf["proto"] == "down" or intf["status"] in ("down", "administratively down")
]
with open(file=f"{hostname}_down-intfs_{date.today()}.json", mode="w") as f:
json.dump(obj=down, fp=f, indent=4)
print(f"Completed backup of {hostname} successfully")
# In case you have to export to text file
# with open(file=f"{hostname}_down-intfs_{date.today()}.txt", mode="w") as f:
# f.write(down)
# print(f"Completed backup of {hostname} successfully")

Airflow Failed: ParseException line 2:0 cannot recognize input near

I'm trying to run a test task on Airflow but I keep getting the following error:
FAILED: ParseException 2:0 cannot recognize input near 'create_import_table_fct_latest_values' '.' 'hql'
Here is my Airflow Dag file:
import airflow
from datetime import datetime, timedelta
from airflow.operators.hive_operator import HiveOperator
from airflow.models import DAG
args = {
'owner': 'raul',
'start_date': datetime(2018, 11, 12),
'provide_context': True,
'depends_on_past': False,
'retries': 2,
'retry_delay': timedelta(minutes=5),
'email': ['raul.gregglino#leroymerlin.ru'],
'email_on_failure': True,
'email_on_retry': False
}
dag = DAG('opus_data',
default_args=args,
max_active_runs=6,
schedule_interval="#daily"
)
import_lv_data = HiveOperator(
task_id='fct_latest_values',
hive_cli_conn_id='metastore_default',
hql='create_import_table_fct_latest_values.hql ',
hiveconf_jinja_translate=True,
dag=dag
)
deps = {}
# Explicity define the dependencies in the DAG
for downstream, upstream_list in deps.iteritems():
for upstream in upstream_list:
dag.set_dependency(upstream, downstream)
Here is the content of my HQL file, in case this may be the issue and I can't figure:
*I'm testing the connection to understand if the table is created or not, then I'll try to LOAD DATA, hence the LOAD DATA is commented out.
CREATE TABLE IF NOT EXISTS opus_data.fct_latest_values_new_data (
id_product STRING,
id_model STRING,
id_attribute STRING,
attribute_value STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED ',';
#LOAD DATA LOCAL INPATH
#'/media/windows_share/schemas/opus/fct_latest_values_20181106.csv'
#OVERWRITE INTO TABLE opus_data.fct_latest_values_new_data;
In the HQL file it should be FIELDS TERMINATED BY ',':
CREATE TABLE IF NOT EXISTS opus_data.fct_latest_values_new_data (
id_product STRING,
id_model STRING,
id_attribute STRING,
attribute_value STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
And comments should start with -- in HQL file, not #
Also this seems incorrect and causing Exception hql='create_import_table_fct_latest_values.hql '
Have a look at this example:
#Create full path for the file
hql_file_path = os.path.join(os.path.dirname(__file__), source['hql'])
print hql_file_path
run_hive_query = HiveOperator(
task_id='run_hive_query',
dag = dag,
hql = """
{{ local_hive_settings }}
""" + "\n " + open(hql_file_path, 'r').read()
)
See here for more details.
Or put all HQL into hql parameter:
hql='CREATE TABLE IF NOT EXISTS opus_data.fct_latest_values_new_data ...'
I managed to find the answer for my issue.
It was related to the path my HiveOperator was calling the file from. As no Variable had been defined to tell Airflow where to look for, I was getting the error I mentioned in my post.
Once I have defined it using the webserver interface (See picture), my dag started to work propertly.
I made a change to my DAG code regarding the file location for organization only and this is how my HiveOperator looks like now:
import_lv_data = HiveOperator(
task_id='fct_latest_values',
hive_cli_conn_id='metastore_default',
hql='hql/create_import_table_fct_latest_values2.hql',
hiveconf_jinja_translate=True,
dag=dag
)
Thanks to (#panov.st) who helped me in person to identify my issue.

Resources