Aws lambda code explanation - aws-lambda

Can anybody please explain the working of the below code.
"def lambda_handlerOut(event, context):
if len(event) > 0:
success=1
print("length of event outside for--"+str(len(event)))
for record in event['Records']:
print("length of event--"+str(len(event)))
bucket=record['s3']['bucket']['name']
key=record['s3']['object']['key']
print("Bucket--"+bucket)
print("File that triggered this event--"+key)
Thanks in advance.
Regards,
Eleena Jose

This is a Lambda that receives S3 events - for example a PutObject request that creates a new file.
The method is the standard Python function - take a look at the Lambda Function Handler Docs for more details.
The structure of the event is defined here but basically there are some number of Records that are being iterated through with and, for each record, the bucket and key are being extracted and printed.
So, in more detail (comments above the line they reference):
# standard lambda event handler definition
def lambda_handlerOut(event, context):
# make sure that something was given - likely unneeded
if len(event) > 0:
success=1
print("length of event outside for--"+str(len(event)))
# loop through each record in Records
for record in event['Records']:
print("length of event--"+str(len(event)))
# take a look at the event structure - just extracting parts
bucket=record['s3']['bucket']['name']
# key is the object name - that is, the file
key=record['s3']['object']['key']
print("Bucket--"+bucket)
print("File that triggered this event--"+key)
EDIT
As I linked to above, the data in the event object looks something like:
{
"Records":[
{
"eventVersion":"2.0",
"eventSource":"aws:s3",
"awsRegion":"us-east-1",
"eventTime":"1970-01-01T00:00:00.000Z",
"eventName":"ObjectCreated:Put",
"userIdentity":{
"principalId":"AIDAJDPLRKLG7UEXAMPLE"
},
"requestParameters":{
"sourceIPAddress":"127.0.0.1"
},
"responseElements":{
"x-amz-request-id":"C3D13FE58DE4C810",
"x-amz-id-2":"FMyUVURIY8/IgAtTv8xRjskZQpcIZ9KG4V5Wp6S7S/JRWeUWerMUE5JgHvANOjpD"
},
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"testConfigRule",
"bucket":{
"name":"mybucket",
"ownerIdentity":{
"principalId":"A3NL1KOZZKExample"
},
"arn":"arn:aws:s3:::mybucket"
},
"object":{
"key":"HappyFace.jpg",
"size":1024,
"eTag":"d41d8cd98f00b204e9800998ecf8427e",
"versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko",
"sequencer":"0055AED6DCD90281E5"
}
}
}
]
}
So, as an example, bucket=record['s3']['bucket']['name'] starts by getting the s3 record from the data which leaves:
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"testConfigRule",
"bucket":{
"name":"mybucket",
"ownerIdentity":{
"principalId":"A3NL1KOZZKExample"
},
"arn":"arn:aws:s3:::mybucket"
},
"object":{
"key":"HappyFace.jpg",
"size":1024,
"eTag":"d41d8cd98f00b204e9800998ecf8427e",
"versionId":"096fKKXTRTtl3on89fVO.nfljtsv6qko",
"sequencer":"0055AED6DCD90281E5"
}
}
From there, it gets the bucket stanza:
"bucket":{
"name":"mybucket",
"ownerIdentity":{
"principalId":"A3NL1KOZZKExample"
},
"arn":"arn:aws:s3:::mybucket"
}
and lastly, the name:
"name":"mybucket"
This is assigned to the variable bucket which is printed out later. The key (which is the file name in this example) works the same way but gets different parts of the event.
Does that make sense now?

Related

Refactor CHEF cookbooks

I have a CHEF recipe for package install on windows hosts. We used to use chef13 client; however we have upgraded to chef16, and we are getting some issues with the cookbook migration.
When I run the existing role as-it-is on a win2k16 chef16 kitchen platform, I get this error,
Chef::Exceptions::ImmutableAttributeModification
------------------------------------------------
Node attributes are read-only when you do not specify which precedence level to set. To set an attribute use code like `node.default["key"] = "value"'
and it points out to this section of the recipe
67: guid = guid.gsub(/{/, '').gsub(/}/, '') # remove {}
68: log "[#{role}][#{recipe_name}][#{app_name}]: GUID for installer #{app_name} is #{guid}"
69>> app_params.store('product_guid', guid) # add GUID to the list of parameters
70: end
Hence, what I understand that chef16 dosen't like the hash.store method.
I have tried some other alternate solutions to update the hash, however they do not work.
Using merge method:-
app_params.merge!({ :product_guid => 'guid' })
Using precedence level explicitly:- ( I have used .default, .override as well, however its the same symptom)
node.force_override!['app_params']['product_guid'] = guid
(reff from : https://github.com/chef/chef/issues/6563)
Note that when using the above methods, i do not get any failures, however the hash dosent get updated at all.
I have added few log statements to show this:-
log "[#{role}][#{recipe_name}][#{app_name}]: app_params.keys is #{app_params.keys}"
log "[#{role}][#{recipe_name}][#{app_name}]: app_params is #{JSON.pretty_generate(app_params)}"
* log[[my-role][my-recipe][my-pkgName]: app_params.keys is ["app_name", "full_path", "action"]] action write
* log[[my-role][my-recipe][my-pkgName]: app_params is {
"app_name": "my-pkgName",
"full_path": "https://myArtifactRepo/.../.../../my-pkgName.msi",
"action": "install"
}] action write
## ^^ hash keys and hash values (app_params.keys and app_params) BEFORE the force_override method is used
log "[#{role}][#{recipe_name}][#{app_name}]: app_params.keys is #{app_params.keys}"
log "[#{role}][#{recipe_name}][#{app_name}]: app_params is #{JSON.pretty_generate(app_params)}"
* log[[my-role][my-recipe][my-pkgName]: app_params.keys is ["app_name", "full_path", "action"]] action write
* log[[my-role][my-recipe][my-pkgName]: app_params is {
"app_name": "my-pkgName",
"full_path": "https://myArtifactRepo/.../.../../my-pkgName.msi",
"action": "install"
}] action write
## ^^ hash keys and hash values (app_params.keys and app_params) AFTER the force_override method is used
Note that the hash and hash key values has not been updated it is still,
["app_name", "full_path", "action"];
however we would expect
["app_name", "full_path", "action", "product_guid"]
and the hash itself like,
"app_name": "my-pkgName",
"full_path": "https://myArtifactRepo/.../.../../my-pkgName.msi",
"action": "install",
"product_guid" : XXXXYYY-0WRR-1234-ABCD-3ERDFR234GRT

Get metadata for a column with google Sheets API

I have a Google spreadsheet that I am connecting to and interacting with using the google-python-api-client package. Following this description on metadata search, and the links in it for the request body, I have written a function to get metadata for a range:
def get_metadata_by_range(range_: Union[dict, str]) -> dict:
if isinstance(range_, str):
print("String range: ", range_)
request_body = {"dataFilters": \
{"a1Range": range_}}
elif isinstance(range_, dict):
print("Dict range: ", range_)
request_body = {"dataFilters": \
[{"gridRange": range_}]}
else:
return None
request = service.spreadsheets().developerMetadata().\
search(spreadsheetId=SPREADSHEET_ID, body=request_body)
return request.execute()
Calling this with a range, either A1 notation or a gridRange will cause an error to occur though. For example, calling it with this line get_metadata_by_range("Metadata!A:A") will cause the following traceback.
String range: Metadata!A:A
Traceback (most recent call last):
File "oqc_server/fab/gapc.py", line 82, in <module>
get_metadata_by_range("Metadata!A:A")
File "oqc_server/fab/gapc.py", line 69, in get_metadata_by_range
return request.execute()
File "/media/kajsa/Storage/Projects/oqc_server/venv/lib/python3.7/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/media/kajsa/Storage/Projects/oqc_server/venv/lib/python3.7/site-packages/googleapiclient/http.py", line 856, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 500 when requesting https://sheets.googleapis.com/v4/spreadsheets/1RhheCsI3kHrm8yK2Yio2kAOU4VOzYdz-eK0vjiMY7co/developerMetadata:search?alt=json returned "Internal error encountered."
Any ideas on what is causing this and how to solve it?
You want to search and retrieve the developer metadata from the range using the method of Method: spreadsheets.developerMetadata.search of Sheets API.
You want to achieve this using google-api-python-client with python.
You have already been able to get and put values for Spreadsheet with Sheets API.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Modification points:
When you want to search the developer metadata with the range, please set the gridrange to dataFilters[].developerMetadataLookup.metadataLocation.dimensionRange.
When the range is set to dataFilters[].a1Range and dataFilters[].gridRange, I could confirm that the same error occurs.
Sample script:
The sample script for retrieving the developer metadata from the range is as follows. Before you use this, please set the variables of spreadsheet_id and sheet_id.
service = build('sheets', 'v4', credentials=creds)
spreadsheet_id = '###' # Please set the Spreadsheet ID.
sheet_id = ### # Please set the sheet ID.
search_developer_metadata_request_body = {
"dataFilters": [
{
"developerMetadataLookup": {
"metadataLocation": {
"dimensionRange": {
"sheetId": sheet_id,
"dimension": "COLUMNS",
"startIndex": 0,
"endIndex": 1
}
}
}
}
]
}
request = service.spreadsheets().developerMetadata().search(
spreadsheetId=spreadsheet_id, body=search_developer_metadata_request_body)
response = request.execute()
print(response)
Above script retrieves the developer metadata from the column "A" of sheet_id.
Note:
Please modify above script for your actual script.
In the current stage, the Developer Metadata can be added to the Spreadsheet, each sheet in the Spreadsheet and row and column. Please be careful this. Ref
References:
Method: spreadsheets.developerMetadata.search
Adding Developer Metadata- DeveloperMetadataLookup
If I misunderstood your question and this was not the direction you want, I apologize.
There is a bug with developerMetadata related to a1Range objects being passed as filters.
Edit
I've checked the bug again and a fix has been implemented.

terraform: lambda invocation during destroy

I have lambda invocation in our terraform-built environment:
data "aws_lambda_invocation" "this" {
count = var.invocation == "true" ? 1 : 0
function_name = aws_lambda_function.this.function_name
input = <<JSON
{
"Name": "Invocation"
}
JSON
}
The problem: the function is invoked not only during creation ("apply") but deletion ("destroy") too. How to invoke it during creation only? I thought about checking environment variables in the lambda (perhaps TF adds name of the process here or something like that) but I hope there's a better way.
Worth checking if you can use the -var 'lambda_xxx=execute' option while running the terraform command to check if the lambda code needs to be executed or not terraform docs
Using that variable lambda_xxx passed in via the command line while executing the command, you can check in the terraform code whether you want to run the lambda code or not.
Below code creates a waf only if the count is 1
resource "aws_waf_rule" "wafrule" {
depends_on = ["aws_waf_ipset.ipset"]
name = "${var.environment}-WAFRule"
metric_name = "${replace(var.environment, "-", "")}WAFRule"
count = "${var.is_waf_enabled == "true" ? 1 : 0}"
predicates {
data_id = "${aws_waf_ipset.ipset.id}"
negated = false
type = "IPMatch"
}
}
Variable declared in variables.tf file
variable "is_waf_enabled" {
type = "string"
default = "false"
description = "String value to indicate if WAF/API KEY is turned on or off (true/any_value)"
}
When you run the command any value other than true is considered false as we are just checking for string true.
Similarly you can do this for your lambda.
There are better alternative solutions for this problem now, which weren't available at the time the question was asked.
Lambda Invocation Resource in AWS provider: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_invocation
Lambda Based Resource in LambdaBased provider: https://registry.terraform.io/providers/thetradedesk/lambdabased/latest/docs/resources/lambdabased_resource
With the disclaimer that I'm the developer of the latter one: If the underlying problem is managing a resource through lambda functions, the lambda based resource has some good features tailored specifically to accomplish that with the obvious drawback of adding another provider dependency.

How to access JSON from external data source in Terraform?

I am receiving JSON from a http terraform data source
data "http" "example" {
url = "${var.cloudwatch_endpoint}/api/v0/components"
# Optional request headers
request_headers {
"Accept" = "application/json"
"X-Api-Key" = "${var.api_key}"
}
}
It outputs the following.
http = [{"componentID":"k8QEbeuHdDnU","name":"Jenkins","description":"","status":"Partial Outage","order":1553796836},{"componentID":"ui","name":"ui","description":"","status":"Operational","order":1554483781},{"componentID":"auth","name":"auth","description":"","status":"Operational","order":1554483781},{"componentID":"elig","name":"elig","description":"","status":"Operational","order":1554483781},{"componentID":"kong","name":"kong","description":"","status":"Operational","order":1554483781}]
which is a string in terraform. In order to convert this string into JSON I pass it to an external data source which is a simple ruby function. Here is the terraform to pass it.
data "external" "component_ids" {
program = ["ruby", "./fetchComponent.rb",]
query = {
data = "${data.http.example.body}"
}
}
Here is the ruby function
#!/usr/bin/env ruby
require 'json'
data = JSON.parse(STDIN.read)
results = data.to_json
STDOUT.write results
All of this works. The external data outputs the following (It appears the same as the http output) but according to terraform docs this should be a map
external1 = {
data = [{"componentID":"k8QEbeuHdDnU","name":"Jenkins","description":"","status":"Partial Outage","order":1553796836},{"componentID":"ui","name":"ui","description":"","status":"Operational","order":1554483781},{"componentID":"auth","name":"auth","description":"","status":"Operational","order":1554483781},{"componentID":"elig","name":"elig","description":"","status":"Operational","order":1554483781},{"componentID":"kong","name":"kong","description":"","status":"Operational","order":1554483781}]
}
I was expecting that I could now access data inside of the external data source. I am unable.
Ultimately what I want to do is create a list of the componentID variables which are located within the external data source.
Some things I have tried
* output.external: key "0" does not exist in map data.external.component_ids.result in:
${data.external.component_ids.result[0]}
* output.external: At column 3, line 1: element: argument 1 should be type list, got type string in:
${element(data.external.component_ids.result["componentID"],0)}
* output.external: key "componentID" does not exist in map data.external.component_ids.result in:
${data.external.component_ids.result["componentID"]}
ternal: lookup: lookup failed to find 'componentID' in:
${lookup(data.external.component_ids.*.result[0], "componentID")}
I appreciate the help.
can't test with the variable cloudwatch_endpoint, so I have to think about the solution.
Terraform can't decode json directly before 0.11.x. But there is a workaround to work on nested lists.
Your ruby need be adjusted to make output as variable http below, then you should be fine to get what you need.
$ cat main.tf
variable "http" {
type = "list"
default = [{componentID = "k8QEbeuHdDnU", name = "Jenkins"}]
}
output "http" {
value = "${lookup(var.http[0], "componentID")}"
}
$ terraform apply
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
http = k8QEbeuHdDnU

Cloudwatch to Elasticsearch parse/tokenize log event before push to ES

Appreciate your help in advance.
In my scenario - Cloudwatch multiline logs needs to be shipped to elasticsearch service.
ECS--awslog->Cloudwatch---using lambda--> ES Domain
(Basic flow though very open to change how data is shipped from CW to ES )
I was able to solve multi-line issue using multi_line_start_pattern BUT
The main issue I am experiencing now - is my logs have ODL format (following format)
[yyyy-mm-ddThh:mm:ss.SSS-Z][ProductName-Version][Log Level]
[Message ID][LoggerName][Key Value Pairs][[
Message]]
AND I will like to parse and tokenize log events before storing in ES (vs the complete log line ).
For example:
[2018-05-31T11:08:49.148-0400] [glassfish 4.1] [INFO] [] [] [tid: _ThreadID=43 _ThreadName=Thread-8] [timeMillis: 1527692929148] [levelValue: 800] [[
[] INFO : (DummyApplicationFunctionJPADAO) EntityManagerFactory located under resource lookup name [null], resource name=AuthorizationPU]]
Needs to be parsed and tokenize using format
timestamp 2018-05-31T11:08:49.148-0400
ProductName-Version glassfish 4.1
LogLevel INFO
MessageID
LoggerName
KeyValuePairs tid: _ThreadID=43 _ThreadName=Thread-8
Message [] INFO : (DummyApplicationFunctionJPADAO)
EntityManagerFactorylocated under resource lookup name
[null], resource name=AuthorizationPU
In above Key Value pairs repeat and are variable - for simplicity I can store all as one long string.
As far as what I gathered about Cloudwatch - It seems Subscription Filter Pattern reg ex support is very limited really not sure how to fit the above pattern. For lambda function that pushes the data to ES have not seen AWS doc or examples that support lambda as means to parse and push for ES.
Will appreciate if someone can please guide what/where will be best option to parse CW logs before it gets into ES => Subscription Filter -Pattern vs in lambda function or any other way.
Thank you .
From what I can see your best bet is what you're suggesting, a CloudWatch log triggered lambda that reformats the logged data into your ES prefered format and then posts it into ES.
You'll need to subscribe this lambda to your CloudWatch logs. You can do this on the lambda console, or the cloudwatch console (https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html).
The lambda's event payload will be: { "awslogs": { "data": "encoded-logs" } }. Where encoded-logs is a Base64 encoding of a gzipped JSON.
For example, the sample event (https://docs.aws.amazon.com/lambda/latest/dg/eventsources.html#eventsources-cloudwatch-logs) can be decoded in node, for example, using:
const zlib = require('zlib');
const data = event.awslogs.data;
const gzipped = Buffer.from(data, 'base64');
const json = zlib.gunzipSync(gzipped);
const logs = JSON.parse(json);
console.log(logs);
/*
{ messageType: 'DATA_MESSAGE',
owner: '123456789123',
logGroup: 'testLogGroup',
logStream: 'testLogStream',
subscriptionFilters: [ 'testFilter' ],
logEvents:
[ { id: 'eventId1',
timestamp: 1440442987000,
message: '[ERROR] First test message' },
{ id: 'eventId2',
timestamp: 1440442987001,
message: '[ERROR] Second test message' } ] }
*/
From what you've outlined, you'll want to extract the logEvents array, and parse this into an array of strings. I'm happy to give some help on this too if you need it (but I'll need to know what language you're writing your lambda in- there are libraries for tokenizing ODL- so hopefully it's not too hard).
At this point you can then POST these new records directly into your AWS ES Domain. Somewhat crypitcally the S3-to-ES guide gives a good outline of how to do this in python: https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-aws-integrations.html#es-aws-integrations-s3-lambda-es
You can find a full example for a lambda that does all this (by someone else) here: https://github.com/blueimp/aws-lambda/tree/master/cloudwatch-logs-to-elastic-cloud

Resources