Lambda function won't run in parallel via Statemachine step functions - aws-lambda

I am sure I am missing some basic info but my setup is as follows:
I have 1 lambda function and I have 2 State Machines.
Each state machine is calling the same lambda function twice in sequence (30 sec pause between calls)
Then I have rules setup to trigger the state machines every minute. Below is what each of my state machines look like, each lambda invoke is for the same function.
Each state machine is passing different params to the lambda function so I am trying to get to a situation where my Lambda function is called every 30 seconds with 1 set of params (from statemachine 1), and the same lambda function is called with a different set of params (via statemachine 2) every 30 seconds.
Looking at the lambda function logs it looks like the state machines will not run the lambda function until the other state machine has completed its entire execution (ie calling the lambda function twice). I would expect that the two state machines would run independently of each other and there would be no timing dependency between them?
Is there some limitation because they are all calling the same lambda function? Or is there some setup issue or is this just how it works?
Thanks!

From the documetation:
When you invoke a Lambda function, the execution will wait for the function to complete. However, it is possible to call Lambda asynchronously using the InvocationType parameter, as seen in the following example
To avoid waiting for oneLambda function to end and continue your step function you have to set the InvocationType to Event in the parameters. However, if your Lambda functions are completely independent from one another using the Parallel execution type may be a better option for you.
{
"Comment": "Parallel Example.",
"StartAt": "LookupCustomerInfo",
"States": {
"LookupCustomerInfo": {
"Type": "Parallel",
"End": true,
"Branches": [
{
"StartAt": "LookupAddress",
"States": {
"LookupAddress": {
"Type": "Task",
"Resource":
"arn:aws:lambda:us-east-1:123456789012:function:AddressFinder",
"End": true
}
}
},
{
"StartAt": "LookupPhone",
"States": {
"LookupPhone": {
"Type": "Task",
"Resource":
"arn:aws:lambda:us-east-1:123456789012:function:PhoneFinder",
"End": true
}
}
}
]
}
}
}

Related

How to prevent Aws Step Function from inserting the parent key "Input" when it invokes a Lambda?

How can I define an AWS Step Function state that passes precisely the same hash into an invoked Lambda that I supplied to the Step Function (e.g., without pushing the input hash down one level under a new key "Input")?
My ruby AWS Lambda Function assumes the incoming event hash looks like:
{
"queryStringParameters": {
"foo": "bar"
}
}
When I perform a test execution on an AWS Step Function, which invokes that lambda, and supply that same hash shown above, the event hash that gets passed into the lambda is not the same as the hash I provided to the Step Function... it has an extra parent key called "Input":
{
"Input":
{
"queryStringParameters": {
"foo": "bar"
}
}
}
In the Step Function, the state which invokes the lambda is defined by:
"invoke foobar": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:xxxx:xxxx:function:xxxx:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Next": "Done",
"TimeoutSeconds": 10
},
Or will a Step Function always take its input and put it "under" a key called "Input"?
And if that is the case that an "Input" key is always added to the event hash passed to a Lambda function, how does one write a Lambda so it can be invoked from both a Step Function (which assumes a root key of "Input") and an API Gateway (which uses a different root key "queryStringParameters")?
Instead of this:
"Payload": {
"Input.$": "$"
}
you should do this:
"Payload.$": "$"
That will pass in the input directly to the Payload of the lambda function.

Schedule AWS-Lambda with Java and CloudWatch Triggers

I am new to AWS and AWS-Lambdas. I have to create a lambda function to run a cron job in every 10 minutes. I am planning to add a Cloudwatch trigger to trigger the same in every 10 minutes but without any event. I looked up on the internet and found that some event needs to be there to get it running.
I need to get some clarity and leads on below 2 points:
Can I schedule a job using AWS-Lambda with cloudwatch triggering the same in span of 10 minutes without any events.
How do one need to make it interact with MySQL databases that have been hosted on AWS.
I have my application built on SpringBoot running on multiple instances with a shared database (single source of truth). I have devised everything stated above using Spring's in-built scheduler and proper synchronisation on DB level using locks but because of the distributed nature of instances, I have been advised to do the same using lambdas.
You need to pass ScheduledEvent object to the handleRequest() of the lambda.
handleRequest(ScheduledEvent event, Contex context)
Configure a cron job that runs every 10 minutes in your cloudwatch template (if using cloudformation). This will make sure to trigger your lambda after every 10 min.
Make sure to add below mentioned dependency to your pom.
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-lambda-java-events</artifactId>
<version>2.2.5</version>
</dependency>
Method 2:
You can specify something like this in your cloudformation template. This will not require any argument to be passed to your handler(), incase you do not require any event related information. This will automatically trigger your lambda as per your cron job.
"ScheduledRule": {
"Type": "AWS::Events::Rule",
"Properties": {
"Description": "ScheduledRule",
"ScheduleExpression": {
"Fn::Join": [
"",
[
"cron(",
{
"Ref": "ScheduleCronExpression"
},
")"
]
]
},
"State": "ENABLED",
"Targets": [
{
"Arn": {
"Fn::GetAtt": [
"LAMBDANAME",
"Arn"
]
},
"Id": "TargetFunctionV1"
}
]
}
},
"PermissionForEventsToInvokeLambdaFunction": {
"Type": "AWS::Lambda::Permission",
"Properties": {
"FunctionName": {
"Ref": "NAME"
},
"Action": "lambda:InvokeFunction",
"Principal": "events.amazonaws.com",
"SourceArn": {
"Fn::GetAtt": [
"ScheduledRule",
"Arn"
]
}
}
}
}
If you want to run a cronjob from cloudwatch event is the only option.
If you don't want to use cloudwatch events then go ahead with EC2 instance. But EC2 will cost you more than the cloudwatch event.
Note: Cloudwatch events rule steup is just like defining cronjob in crontab in any linux system, nothing much. In linux serevr you will define everything as a RAW one but here its just an UI based one.

Can a lambda in an AWS Step Function know the "execution name" of the step function that launched it?

I have this step function that can sometimes fail and I'd like to record this in a (dynamo) DB. What would be handy is if I could just create a new error handling step and that guy would just pick up the "execution name" from somewhere (didn't find it in the context) and record this as a failure.
Is that possible?
AWS Step Functions released recently a feature called context object.
Using $$ notation inside the Parameters block you can access information regarding your execution, including execution name, arn, state machine name, arn and others.
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html
You can create a state to extract the context details that are then accessible to all the other states, such as:
{
"StartAt": "ExtractContextDetails",
"States": {
"ExtractContextDetails": {
"Parameters": {
"arn.$": "$$.Execution.Id"
},
"Type": "Pass",
"ResultPath": "$.contextDetails",
"Next": "NextStateName"
}
}
....
}
Yes, it can, but it is not as straight-forward as you might hope.
You are right to expect that a Lambda should be able to get the name of the calling state machine. Lambdas are passed in a context object that returns information on the caller. However, that object is null when a state machine calls your Lambda. This means two things. You will have to work harder to get what you need, and that this might be implemented in the future.
Right now, the only way I know of achieving this is by starting the execution of the state machine from within another Lambda and passing in the name in the input Json. Here is my code in Java...
String executionName = //generate a unique name...
StartExecutionRequest startExecutionRequest = new StartExecutionRequest()
.withStateMachineArn(stateMachineArn)
.withInput(" {"executionName" : executionName} ") //make sure you escape the quotes
.withName(executionName);
StartExecutionResult startExecutionResult = sf.startExecution(startExecutionRequest);
String executionArn = startExecutionResult.getExecutionArn();
If you do this, you will now have the name of your execution in the input JSON of your first step. If you want to use it in other steps, you should pass it around.
You might also want the ARN of the the execution so you can call state machine methods from within your activities or tasks. You can construct the ARN yourself by using the executionName...
arn:aws:states:us-east-1:acountid:execution:statemachinename:executionName
No. Unless you pass that information in the event, Lambda doesn't know whether or not it's part of a step function. Step functions orchestrate lambdas and maintain state between lambdas.
"States": {
"Get Alter Query": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-northeast-2:1111111:function:test-stepfuction:$LATEST",
"Payload": {
"body.$": "$",
"context.$": "$$"
}
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "Alter S3 Location"
}
}
I solved it by adding context to the payload.
I highly recommend when using step functions to specify some sort of key in the step function configuration. For my step functions I always provide:
"ResultPath": "$",
"Parameters": {
"source": "StepFunction",
"type": "LAMBDA_METHOD_SWITCH_VALUE",
"payload.$": "$"
},
And have each call to lambda use the type field to determine what code to call. When your code fails, wrap it in a try/catch and explicitly use the passed in type which can be the name of the step to determine what to do next.

How to get a scheduled lambda to run again unless specified in callback

I have a lambda function which can take longer to complete than 5 mins. It uses a scheduled cron trigger but in the event it doesn't complete, I would like the lambda to run again unless the callback provided to the handler is called to mark it complete. Is there a way to do this with scheduled triggers?
You can create Step functions, where the function doing long-running job can be put in retry block. Sample as follows:
{
"Comment": "function comment",
"StartAt": "Step1",
"States": {
"Step1": {
"Type": "Task",
"Resource": "place your lambda ARN",
"Next": "Step2"
},
"Step2": {
"Type": "Task",
"Resource": "place your lambda ARN",
"Catch": [ {
"ErrorEquals": [ "your error thrown from lambda like job not completed" ],
"Next": "Step2" //here it retries the function again
}],
"End": true
}
}
}
AWS Lambda limits the maximum execution time of a single invocation to 5 minutes. we should write Lambda functions that perform long-running tasks as recursive functions.
use context.getRemainingTimeInMillis()
When your function is invoked, the context object allows you to find out how much time is left in the current invocation. 
Suppose you have an expensive task that can be broken into small tasks that can be processed in batches. At the end of each batch, use context.getRemainingTimeInMillis() to check if there’s still enough time to keep processing. Otherwise, recurse and pass along the current position so the next invocation can continue from where it left off like as follows:
module.exports.handler = (event, context, callback) => {
let position = event.position || 0;
do {
... // process the tasks in small batches that can be completed
in,
say, less than 10s
// when there's less than 10s left, stop
} while (position < totalTaskCount &&
context.getRemainingTimeInMillis() > 10000);
if (position < totalTaskCount) {
let newEvent = Object.assign(event, { position });
recurse(newEvent);
callback(null, `to be continued from [${position}]`);
} else {
callback(null, "all done");
}
};
I would not use trigger in this case. Here is how I would handle it.
Create a Kinesis Stream and Lambda.
In the Lambda do the smallest operation possible and finish it. Don't make the task bigger or taking longer time.
At the end of your code, post an event to Kinesis Stream.
Add a trigger to your Lambda from the Stream.
It will make a continious loop asynchronously chugging your business logic.
Problem with using trigger is, even if your previous run is not completed, it will trigger your next run, So there are chances two instances of the lambda could run in parallel.
Hope it helps.

Can a lambda in an AWS Step Function know the name of the step it is in?

For a lambda executed within a step function, I kind of expected that I could get the name of the current step from the lambda context, but it doesn't seem to be that simple.
Is there any way to get the name of the current step in a lambda that is executed within a Step Function?
UPDATE: as of 05/23/2019 this answer is outdated, since AWS introduced a way to access current step within a step function, see the accepted answer.
Looks like you are right, the current step doesn't get exposed through the context variable.
So, the information that would allow you to identify what stage is the state machine currently in, should be passed from the previous step (i.e. from the previous lambda). This seems to be the most correct option.
Or, as a workaround, you could try inserting pass states before calling your lambda functions to pass an id that could help you to identify the current stage.
Suppose you have two steps in your state machine:
"Step1Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step1test",
"Next": "Step2Test"
},
"Step2Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step2test",
"End": true
}
Here is how you can provide your lambda functions with current step id passed via event.stepId
"Step1TestEnter": {
"Type": "Pass",
"Next": "Step1Test",
"Result": "Step1Test",
"ResultPath": "$.stepId"
},
"Step1Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step1test",
"Next": "Step2TestEnter"
},
"Step2TestEnter": {
"Type": "Pass",
"Next": "Step2Test",
"Result": "Step2Test",
"ResultPath": "$.stepId"
},
"Step2Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step2test",
"End": true
}
AWS Step Functions released Context Object where you can access information about your execution.
You can use it to send the execution arn to your lambda.
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html
Based on the documentation, I was using Parameters and $$ to pass the step function context object into my lambda function. To test and see if it was working, I thought I could go to the step function console, start a new execution and see the context object being passed into the step function on the "Step Input" tab. To my dismay, it wasn't displayed there. I added some diagnostic logging to the lambda function serializing the input to JSON and logging out to CloudWatch. Cloudwatch logs showed that the context object was being passed in.
Anyway, thought I would post this here to maybe help someone avoid the time I spent trying to figure this one out. It gets passed in, just doesn't show up in the step function console.
I highly recommend when using step functions to specify some sort of key in the step function configuration. For my step functions I always provide:
"ResultPath": "$",
"Parameters": {
"source": "StepFunction",
"type": "LAMBDA_METHOD_SWITCH_VALUE",
"payload.$": "$"
},
And have each call to lambda use the type field to determine what code to call. I have found this to be much easier to implement.

Resources