kinesis agent to lambda, how to get origin file and server - aws-lambda

I have a kinesis agent that streams a lot of log files information to kinesis streams and I have a Lambda function that parses the info.
On Lambda in addition to the string I need to know the source file name an machine name is it possible?

You can add it to the data that you send to Kinesis.
Lambda gets Kinesis records as base64 string, you can encode to this string a JSON of this form:
{
"machine": [machine],
"data": [original data]
}
And then, when processing the records on Lambda: (nodejs):
let record_object = JSON.parse(new Buffer(event.Records[0].kinesis.data, 'base64').toString('utf8'));
let machine = record_object.machine;
let data = record_object.data;

Assuming you are using Kinesis Agent to produce data stream. I see that the opensource community has added ADDEC2METADATA as a preprocessing option in the agent. The source code
Make sure that the source content file is of JSON format. If the original format is CSV then use the CSVTOJSON transformer first to convert it to JSON and then pipe it to ADDEC2METADATA transformer as shown below.
Open agent.json and add the following:
"flows": [
{
"filePattern": "/tmp/app.log*",
"kinesisStream": "my-stream",
"dataProcessingOptions": [
{
"optionName": "CSVTOJSON",
"customFieldNames": ["your", "custom", "field", "names","here", "if","origin","file","is","csv"],
"delimiter": ","
},
{
"optionName": "ADDEC2METADATA",
"logFormat": "RFC3339SYSLOG"
}
]
}
]
}
If your code is running out of a container/ECS/EKS etc. where the originating info is not as simple as collecting info about bare-metal EC2, then use "ADDMETADATA" declarative as shown below in the agent.log file:
{
"optionName": "ADDMETADATA",
"timestamp": "true/false",
"metadata": {
"key": "value",
"foo": {
"bar": "baz"
}
}
}

Related

Issues getting flow to send the correct json in body when using powerautomate's http request

I'm using a PowerAutomate Flow to call a native SmartSheet API that does a POST. The POST IS working but my MULTI_PICKLIST type field is not being populated correctly in SmartSheet due to the double quotes.
The API is: concat('https://api.smartsheet.com/2.0/sheets/', variables('vSheetID'), '/rows')
In the Body section of the http rest API I form my JSON and the section of interest looks like this:
{
"columnId": 6945615984781188,
"objectValue": {
"objectType": "MULTI_PICKLIST",
"values": [
#{variables('vServices')}
]
}
}
My variable vServices raw output looks like:
{
"body":
{
"name": "vServices",
"value": "Test1, Test2"
}
}
The format needs to be like this (it works using PostMan).
{
"columnId": 6945615984781188,
"objectValue": {
"objectType": "MULTI_PICKLIST",
"values": [
"Test1","Test2"
]
}
}
As a step in formatting my vServices variable I tried to do a replace function to replace the ',' with a '","' but this ultimately ends up as a \",\"
Any suggestion on how to get around this? Again, ultimately I need the desired JSON Body to read but haven't been able to achieve this in the Body section:
{
"columnId": 6945615984781188,
"objectValue": {
"objectType": "MULTI_PICKLIST",
"values": [
"Test1","Test2"
]
}
}
vs this (when using replace function).
{
"columnId": 6945615984781188,
"objectValue": {
"objectType": "MULTI_PICKLIST",
"values": [
"Test1\",\"Test2"
]
}
}
Thank you in advance,
I resolved my issue by taking the original variable, sending it to a compose step that did a split on the separator of a comma. I then added a step to set a new variable to the output of the compose step. This left me with a perfectly setup array in the exact format I needed! This seemed to resolve any of the issues I was having with double quotes and escape sequences.

AWS sample test json for an SQS message with SNS notifications in it, to hit a lambda

I am running into some trouble understanding how to create a sample test json to test my lambda. Currently the workflow is SNS->SQS->Lambda. I am trying to test the lambda in console with a sample json. I have tried putting SNS message under "body" field in SQS both as a json and a json string but facing parsing issues. Have referred a few other SO answers (ref: Amazon SNS -> SQS message body), but the suggestion was to use raw-messaging option, which my subscribers do not use. Can some one post a sample json structure to test, for SQS records with SNS notifications in them?
PS:
Tried below test event (without json string in body). Also tried using json string instead for the body.
{
"Records": [
{
"messageId": "19dd0b57-b21e-4ac1-bd88-01bbb068cb78",
"receiptHandle": "MessageReceiptHandle",
"body":{
"Type" : "Notification",
"MessageId" : "84102bd5-8890-4ed5-aeba-c15fafc926dc",
"TopicArn" : "arn:aws:sns:eu-west-1:534706846367:HelloWorld",
"Message" : "hello World",
"Timestamp" : "2012-06-05T13:44:22.360Z",
"SignatureVersion" : "1",
"Signature" : "Qzh0qXhijBKylaFwc9PGE+lQQDwHGWkIzCW2Ld1eVrxNfSem4yyBTgouqGX26V0m1qhFD4RQcBzE3oNqx5jFhJfV4hN45FNcsFVnmfLPGNUTmJWblSk8f6znWgTy8UtK9xrTeNYzK59k3VJ4WTJ5kCEj+2vH7sBV15fAXeCAtdQ=",
"SigningCertURL" : "https://sns.eu-west-1.amazonaws.com/SimpleNotificationService-f3ecfb7224c7233fe7bb5f59f96de52f.pem",
"UnsubscribeURL" : "https://sns.eu-west-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:eu-west-1:534706846367:HelloWorld:8a3acde2-cb0b-4a56-9b9c-b75ed7307556"
},
"attributes": {
"ApproximateReceiveCount": "1",
"SentTimestamp": "1523232000000",
"SenderId": "123456789012",
"ApproximateFirstReceiveTimestamp": "1523232000001"
},
"messageAttributes": {},
"md5OfBody": "{{{md5_of_body}}}",
"eventSource": "aws:sqs",
"eventSourceARN": "arn:aws:sqs:us-east-1:123456789012:MyQueue",
"awsRegion": "us-east-1"
}
]
}

Azure Data Factory REST API paging with Elasticsearch

During developing pipeline which will use Elasticsearch as a source I faced with issue related paging. I am using SQL Elasticsearch API. Basically, I've started to do request in postman and it works well. The body of request looks following:
{
"query":"SELECT Id,name,ownership,modifiedDate FROM \"core\" ORDER BY Id",
"fetch_size": 20,
"cursor" : ""
}
After first run in response body it contains cursor string which is pointer to next page. If in postman I send the request and provide cursor value from previous request it return data for second page and so on. I am trying to archive the same result in Azure Data Factory. For this I using copy activity, which store response to Azure blob. Setup for source is following.
copy activity source configuration
This is expression for body
{
"query": "SELECT Id,name,ownership,modifiedDate FROM \"#{variables('TableName')}\" WHERE ORDER BY Id","fetch_size": #{variables('Rows')}, "cursor": ""
}
I have no idea how to correctly setup pagination rule. The pipeline works properly but only for the first request. I've tried to setup Headers.cursor and expression $.cursor but this setup leads to an infinite loop and pipeline fails with the Elasticsearch restriction.
I've also tried to read document at https://learn.microsoft.com/en-us/azure/data-factory/connector-rest#pagination-support but it seems pretty limited in terms of usage examples and difficult for understanding.
Could somebody help me understand how to build the pipeline with paging abilities utilization?
Responce with the cursor looks like:
{
"columns": [
{
"name": "companyId",
"type": "integer"
},
{
"name": "name",
"type": "text"
},
{
"name": "ownership",
"type": "keyword"
},
{
"name": "modifiedDate",
"type": "datetime"
}
],
"rows": [
[
2,
"mic Inc.",
"manufacture",
"2021-03-31T12:57:51.000Z"
]
],
"cursor": "g/WuAwFaAXNoRG5GMVpYSjVWR2hsYmtabGRHTm9BZ0FBQUFBRUp6VGxGbUpIZWxWaVMzcGhVWEJITUhkbmJsRlhlUzFtWjNjQUFBQUFCQ2MwNWhaaVIzcFZZa3Q2WVZGd1J6QjNaMjVSVjNrdFptZDP/////DwQBZgljb21wYW55SWQBCWNvbXBhbnlJZAEHaW50ZWdlcgAAAAFmBG5hbWUBBG5hbWUBBHRleHQAAAABZglvd25lcnNoaXABCW93bmVyc2hpcAEHa2V5d29yZAEAAAFmDG1vZGlmaWVkRGF0ZQEMbW9kaWZpZWREYXRlAQhkYXRldGltZQEAAAEP"
}
I finally find the solution, hopefully, it will be useful for the community.
Basically, what needs to be done it is split the solution into four steps.
Step 1 Make the first request as in the question description and stage file to blob.
Step 2 Read blob file and get the cursor value, set it to variable
Step 3 Keep requesting data with a changed body
{"cursor" : "#{variables('cursor')}" }
Pipeline looks like this:
pipeline
Configuration of pagination looks following
pagination . It is a workaround as the server ignores this header, but we need to have something which allows sending a request in loop.

Ruby - Parse a file into hash

I've a file containing hundreds of object and value combination like below manner. I want to get input from user as object name & numeric value and return that associated value.
Object cefcFRUPowerOperStatus
Type PowerOperType
1:offEnvOther
2:on
3:offAdmin
4:offDenied
5:offEnvPower
6:offEnvTemp
Object cefcModuleOperStatus
Type ModuleOperType
1:unknown
2:ok
3:disabled
4:okButDiagFailed
5:boot
6:selfTest
E.g. - input -
objectName = 'cefcModuleOperStatus'
TypeNumber = '4'
Return - 'okButDiagFailed'
I am not aware of Ruby and get this done to help my peer. So please excuse if this is a novice question.
Note:- I've to create the file so with any file format it would be a great help.
If like you say you have control over creating the original data file, then creating it in json format would make accessing it trivial.
Here is a repl.it of complete working example. Just select the main.rb file and hit run!
For example if you create json file like:
data.json
{
"cefcFRUPowerOperStatus": {
"type": "PowerOperType",
"status": {
"1": "offEnvOther",
"2": "on",
"3": "offAdmin",
"4": "offDenied",
"5": "offEnvPower",
"6": "offEnvTemp"
}
},
"cefcModuleOperStatus": {
"type": "ModuleOperType",
"status": {
"1": "unknown",
"2": "ok",
"3": "disabled",
"4": "okButDiagFailed",
"5": "boot",
"6": "selfTest"
}
}
}
Then parsing it and accessing it in Ruby is as simple as :
require 'json'
file = File.read('data.json')
data = JSON.parse(file)
#accessing this data is simple now:
puts data["cefcModuleOperStatus"]["status"]["4"]
# > okButDiagFailed
Note: that this JSON format will work if your statuses are unique. If they are not, you can still use this way, but you will need to convert JSON to an array format. Let me know if this is the case and I can show you how to modify the json and ruby code for this.
Hope that helps, let me know if you have further questions about how this works.

Remove fields by their name pattern

We currently are using logstash with elasticsearch to log some of out application events.
some of event holds fields that are dynamically named.
We want to apply a filter that will removed or merged them before entering to elasticsearch.
for example :
{
"Root": {
"EventType": "Info",
"Timestamp": 20150713153757.758
},
"Event": {
"Message": "itemsViews Created in 1 mSec",
"Cache_11542": true,
"Cache_10242": false,
"Cache_55240": 124
}
}
In this case we would like to remove all the fields starting with "Cache_" under the object Event.
so the output to elasticsearch will be
{
"Root": {
"EventType": "Info",
"Timestamp": 20150713153757.758
},
"Event": {
"Message": "itemsViews Created in 1 mSec"
}
}
Is there a way to define a filler in the logstash configuration file to achieve this ?
Many thanks in advance.
Looks like the Ruby filter solution that #magnus-bäck points out might be your solution. I had originally suggested the the mutate filter using the "remove_field" array in conjunction with the gsub filter. Gsub to regex match your Cache* fields that can then be renamed into a variable for use in mutate. However, since you have n-number of Cache fields, I like the ruby script better. :)

Resources