Lambda Step Functions: Fire & Forget pattern - aws-lambda

I have a Python-based Lambda (core Lambda) serving a synchronous API. The API is triggered from an user interactive application. I now need to add some logging & metrics (slightly compute intensive) to the Lambda. I don't want the core Lambda to be delayed by this. I want to push this into a new Lambda (logging Lambda). What I want is- core Lambda completes its work, triggers the logging Lambda (fire & forget) and returns the response to API call immediately. The end state (success/failure) of the logging Lambda is irrelevant.
Can "Step Functions" achieve this? The core & logging Lambdas have their own end state and I'm not sure if the "Step" function pattern can accommodate this.

You can start an asynchronous Lambda function invocation using "InvocationType": "Event" in your Invoke parameters. To do that in Step Functions, the ASL code looks like this:
{
"StartAt": "Invoke Lambda function asynchronously",
"States": {
"Invoke Lambda function asynchronously": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "myFunction",
"Payload.$": "$",
"InvocationType": "Event"
},
"End": true
}
}
}
Having an async Lambda Task (as shown above) after your core Lambda Task seems like it should work. To make sure the logging Lambda failing doesn't affect the overall workflow, you can add a Catcher to it on States.ALL and redirect to a Succeed state.
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html#error-handling-fallback-states

If the secondary Lambda is purely invoked for logging purposes and the state machine is not dependent on its output, you could invoke the secondary Lambda from within your primary Lambda, then return from the primary Lambda. This way your state machine doesn't need to know about the logging steps and you can "fire and forget" before resuming your workflow.

Related

How to trigger a AWS lambda by sending event to EventBridge

I have a AWS lambda that the trigger for activating it is an event from EventBridge (rule)
The rule looks like this:
{
"detail-type": ["ECS Task State Change"],
"source": ["aws.ecs"],
"detail": {
"stopCode": ["EssentialContainerExited", "UserInitiated"],
"clusterArn": ["arn:aws:ecs:.........."],
"containers": {
"name": ["some name"]
},
"lastStatus": ["DEACTIVATING"],
"desiredStatus": ["STOPPED"]
}
}
This event is normally triggered when ECS task status is changed (in this case when a task is killed)
My questions are:
Can I simulate this event from command line?
maybe by running aws events put-events --entries file://putevents.json
(What should I write in the putevents.json file?)
Can I simulate this event from Javascript code?
TL;DR Yes and yes, provided you deal with with the limitation that user-generated events cannot have a source that begins with aws.
Send custom events to EventBridge with the PutEvents API. The API is available in the CLI as well as in the SDKs (see AWS JS SDK). The list of custom events you pass in the entries parameter must have three fields at a minimum:
[
{
"source": "my-custom-event", // cannot start with aws !!,
"detail-type": "ECS Task State Change",
"detail": {} // copy from the ECS sample events docs
}
]
The ECS task state change event samples in the ECS documentation make handy templates for your custom events. You can safely prune any non-required field that you don't need for pattern matching.
Custom events are not permitted to mimic the aws system event sources. So amend your rule to also match on your custom source name:
"source": ["aws.ecs", "my-custom-event"],

How do I use Heartbeat with a Callback Return Step Function in my Lambda Function?

My Lambda function is required to send a token back to the step function for it to continue, as it is a task within the state machine.
Looking at my try/catch block of the lambda function, I am contemplating:
The order of SendTaskHeartbeatCommand and SendTaskSuccessCommand
The required parameters of SendTaskHeartbeatCommand
Whether I should add the SendTaskHeartbeatCommand to the catch block, and then if yes, which order they should go in.
Current code:
try {
const magentoCallResponse = await axios(requestObject);
await stepFunctionClient.send(new SendTaskHeartbeatCommand(taskToken));
await stepFunctionClient.send(new SendTaskSuccessCommand({output: JSON.stringify(magentoCallResponse.data), taskToken}));
return magentoCallResponse.data;
} catch (err: any) {
console.log("ERROR", err);
await stepFunctionClient.send(new SendTaskFailureCommand({error: JSON.stringify("Error Sending Data into Magento"), taskToken}));
return false;
}
I have read the documentation for AWS SDK V3 for SendTaskHeartbeatCommand and am confused with the required input.
The SendTaskHeartbeat and SendTaskSuccess API actions serve different purposes.
When your task completes, you call SendTaskSucces to report this back to Step Functions and to provide the results from the Task that your workflow can then process. You do not need to call SendTaskHeartbeat before SendTaskSuccess and the usage you have in the code above seems unnecessary.
SendTaskHeartbeat is optional and you use it when you've set "HeartbeatSeconds" on your Task. When you do this, you then need your worker (i.e. the Lambda function in this case) to send back regular heartbeats while it is processing work. I'd expect that to be running asynchronously while your code above was running the first line in the try block. The reason for having heartbeats is that you can set a longer TimeoutSeconds (or dynamically using TimeoutSecondsPath) than HeartbeatSeconds, therefore failing / retrying fast when the worker dies (Heartbeat timeout) while you still allow your tasks to take longer to complete.
That said, it's not clear why you are using .waitForTaskToken with Lambda. Usually, you can just use the default Request Response integration pattern with Lambda. This uses the synchronous invoke mode for Lambda and will return the response back to you without you needing to integrate back with Step Functions in your Lambda code. Possibly you are reading these off of an SQS queue for concurrency control or something. But if not, just use Request Response.

Winston Force flush before ending lambda execution

I'm trying to use Winston to send logs to Datadog from an Aws Lambda. The problem with the lambdas is that once we return a response, the lambda execution stops and it doesn't give time to Winston to flush the logs.
Is there a way I can force the flush before returning. I'm trying this but it doesn't seem to do the trick:
async function handler (event): Promise<FormattedJSONResponse> {
const logger = getLogger()
// do some work
await closeLogger(logger)
return awsResponse
}
function closeLogger (logger: Logger): Promise<any> {
const loggerDone = new Promise((resolve, _) => {
logger.on('finish', () => {
resolve(logger)
})
})
logger.end()
logger.close()
return loggerDone
}
Versions:
AWS Lambda with nodejs 12
Winston: 3.3.3
Thanks for your help
First of all I don't understand why you would want to send your logs within you lambda function? If you do so your lambda function will run longer to process the logs, meaning you will be charged for the time it takes to send the logs to Datadog.
Instead, you could save the logs to CloudWatch. To avoid high charges for CloudWatch set the retention to a rather short time, maybe one day. On the CloudWatch log stream you can then add a subscriber which could be another lambda function. This "log-processor"-lambda-function will process, transform the logs and send them to Datadog. With this architecture your first lambda function containing the business logic won't fail if Datadog cannot be reached for instance. It makes your architecture more resilient and has better separation of concerns. Yan Cui wrote a great article on "Centralised logging for AWS Lambda"
Another approach, still separating your logging from your lambda function business logic to some degree, builds upon lambda extensions namely the Lambda Logs API.
Put simple, lambda extensions add an extra layer to your function but are not part of the lambda function's code itself. Probably the best part for you: Datadog already offers a ready to use extension, which is responsible for:
Pushing real-time enhanced Lambda metrics, custom metrics, and traces from the Datadog Lambda Library to Datadog.
Forwarding logs from your Lambda function to Datadog.
For more info on Lambda extensions follow the links mentioned above or have a look at Yan Cui's post "Lambda Logs API: a new way to process Lambda logs in real-time"
After spending 4 hours on this issue, I found no other way (that works, isn't buggy and is transport agnostic) than to use an arbitrary timeout before returning a response.
This example is for NextJS but you can easily remove res: NextApiResponse.
export const gracefulExit = (response: any, res: NextApiResponse) => {
setTimeout(() => {
res.send({ ...response, sessionId });
}, 400);
};
Then in all my serverless functions I don't do res.send({x}) but rather gracefulExit({x}, res)

How to send a CloudWatchEvent from a lambda to an EventBridge destination

I have a lambda which is triggered by an EventBridge custom bus. I want to send another event to the customer bus at the end of the function processing. I created a destination in the lambda to send to the same custom bus.
I have the following code where the function handler will return a CloudWatchEvent. This is not working.
public async Task<CloudWatchEvent<object>> FunctionHandler(CloudWatchEvent<object> evnt, ILambdaContext context)
{
return await ProcessMessageAsync(evnt, context);
}
My lambda was being triggered by S3 input event (which is asynchronous), I tried adding destination on Lambda "success" to EventBridge bus, created a rule to capture that and send it to CloudWatch logs but it didn't seem to work.
Turns out, while creating the Rule in EventBridge, event pattern was set to:
{
"source": ["aws.lambda"]
}
Which is what you get if you are using the console and selecting AWS Lambda as the AWS Service.
Infuriated, I couldn't seem to get it to work even with a simple event. On further inspection, I looked at the input event and realized that it wants lambda and not aws.lambda. It is also mentioned in the documentation: https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
So to fix it, I changed it to
{
"source": ["lambda"]
}
and it worked for me.
Have you given a shot to AWS Lambda Destinations. There are 4 types of Destinations supported
SQS Queue
SNS Topic
Event Bridge Event Bus
Lambda Function itself.

Is there any way to trigger a AWS Lambda function at the end of an AWS Glue job?

Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lambda function at the end of a Glue job? Lambda functions can be triggered using SNS messages, but I couldn't find a way to send an SNS at the end of the Glue job.
#oreoluwa is right, this can be done using Cloudwatch Events.
From the Cloudwatch dashboard:
Click on 'Rules' from the left menu
For 'Event Source', choose 'Event Pattern' and in 'Service Name' choose 'Glue'
For 'Event Type' choose 'Glue Job State Change'
On the right side of the page, in the 'Targets' section, click 'Add Target' -> 'Lambda Function' and then choose your function.
The event you'll get in Lambda will be of the format:
{
'version': '0',
'id': 'a9bc90be-xx00-03e0-9bc5-a0a0a0a0a0a0',
'detail-type': 'GlueJobStateChange',
'source': 'aws.glue',
'account': 'xxxxxxxxxx',
'time': '2018-05-10T16: 17: 03Z',
'region': 'us-east-2',
'resources': [],
'detail': {
'jobName': 'xxxx_myjobname_yyyy',
'severity': 'INFO',
'state': 'SUCCEEDED',
'jobRunId': 'jr_565465465446788dfdsdf546545454654546546465454654',
'message': 'Jobrunsucceeded'
}
}
Since AWS Glue has started supporting python, you can probably follow the below path to achieve what you desire. Below sample script shows how to do that -
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3 ## Step-2
## #params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## Do all ETL stuff here
## Once the ETL completes
lambda_client = boto3.client('lambda') ## Step-3
response = lambda_client.invoke(FunctionName='string') ## Step-4
Create a python based Glue Job (to perform ETL on Redshift)
In the job script, import boto3 (need to place this package as script library).
Make a connection to lambda using boto3
Invoke lambda function using the boto3 lambda invoke() once the ETL completes.
Please make sure that the role that you are using while creating the Glue job has permissions to invoke lambda functions.
Refer to the Boto3 documentation for lambda here.
No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.
But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.
#ace and #adeel, have part of the solution, but you could get this resolved by creating the CloudWatch Rule with the following event pattern:
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Job State Change"
],
"detail": {
"jobName": [
"<YourJobName>"
],
"state": [
"SUCCEEDED"
]
}
}
Lambda can be triggered on S3 put. You can put a dummy file on S3 as the last glue job; which would in turn trigger lambda. I have tested this.
You can orchestrate your AWS Glue Jobs and AWS Lambda functions by using AWS Step Functions. Here is a blog post that explains how to do it and gives an example: https://aws.amazon.com/blogs/big-data/orchestrate-multiple-etl-jobs-using-aws-step-functions-and-aws-lambda/
In essence, when a Glue job finishes (success or fail), your Step Function workflow can catch the event and invoke your Lambda function.
yes it is possible to trigger but for this we have to take help of EventBridge .
Please follow below instruction
go to EventBridge then Under Events you will find rules click on it then click on create rule give a suitable name to your rule by make sure radio button selected on Rule with an event pattern then click Next in event source it will be AWS events or EventBridge partner events then in creation method select Use pattern form.
In event pattern select event source as "AWS service" and in AWS service select glue and then new drop down selection will be enabled there select "Glue Job State Change"
then right side event pattern is there click on edit pattern and do changes as per your need.
{
"detail-type": ["Glue Job State Change"],
"source": ["aws.glue"],
"detail": {
"jobName": ["Your glue Name"],
"state": ["FAILED"]
}
}
in state : STARTING , RUNNING , STOPPING , STOPPED , SUCCEEDED , FAILED , ERROR , WAITING and TIMEOUT you can choose this
don't use any other field unless you are using ec2 instance then you have to use resources field and you can place it next to source
then click on next select aws service in target type select Lambda function and then select your lambda function name in new drop down which appeared after selecting the target and then next , next and save.
congrats you have successfully created the configuration to trigger lambda function based on glue job.

Resources