How to ensure a Step Function executes Parameterized Query properly in AWS? - aws-lambda

I'm currently trying to execute an Athena Query during a State Machine. The query itself needs a date variable to use in several WHERE statements so I'm using a Lambda to generate it.
When I run EXECUTE prepared-statement USING 'date', 'date', 'date'; directly in Athena, I get the results I expect so I know the query is formed correctly, but when I try to do it in the state machine, it gives me the following error:
SYNTAX_ERROR: line 19:37: Unexpected parameters (integer) for function date. Expected: date(varchar(x)) , date(timestamp) , date(timestamp with time zone)
So my best guess is that I'm somehow not passing the execution parameters correctly.
The Lambda that calculates the date returns it in a string with the format %Y-%m-%d, and in the State Machine I make sure to pass it to the output of every State that needs it. Then I get a named query to create a prepare statement from within the state machine. I then use that prepared statement to run an EXECUTE query that requires the date multiple times, so I use an intrinsic function to turn it into an array:
{
"StartAt": "calculate_date",
"States": {
"calculate_date": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:::function:calculate_date:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "get_query",
"ResultSelector": {
"ExecDate.$": "$.Payload.body.exec_date"
}
},
"get_query": {
"Type": "Task",
"Next": "prepare_query",
"Parameters": {
"NamedQueryId": "abc123"
},
"Resource": "arn:aws:states:::aws-sdk:athena:getNamedQuery",
"ResultPath": "$.Payload"
},
"prepare_query": {
"Type": "Task",
"Next": "execute_query",
"Parameters": {
"QueryStatement.$": "$.Payload.NamedQuery.QueryString",
"StatementName": "PreparedStatementName",
"WorkGroup": "athena-workgroup"
},
"Resource": "arn:aws:states:::aws-sdk:athena:createPreparedStatement",
"ResultPath": "$.Payload"
},
"execute_query": {
"Type": "Task",
"Resource": "arn:aws:states:::athena:startQueryExecution",
"Parameters": {
"ExecutionParameters.$": "States.Array($.ExecDate, $.ExecDate, $.ExecDate)",
"QueryExecutionContext": {
"Catalog": "catalog_name",
"Database": "database_name"
},
"QueryString": "EXECUTE PreparedStatementName",
"WorkGroup": "athena-workgroup",
"ResultConfiguration": {
"OutputLocation": "s3://bucket"
}
},
"End": true
}
}
}
The execution of the State Machine returns successfully, but the query doesn't export the results to the bucket, and when I click on the "Athena query execution" link in the list of events, it takes me to the Athena editor page where I see the error listed above
https://i.stack.imgur.com/pxxOm.png
Am I generating the ExecutionParameters wrong? Does the createPreparedStatement resource need a different syntax for the query parameters? I'm truly at a lost here, so any help is greatly appreciated

I just solved my problem. And I'm posting this answer in case anyone comes across the same issue.
Apparently, the ExecutionParameters paremeter in an Athena StartQueryExecution state does not respect the variable type of a JSONPath variable, so you need to manually add the single quotes when forming your array. I solved this by adding a secondary output from the lambda, with the date wrapped in single quotes so that when I make the array using instrinsic functions and pass it to the query execution, it forms the query string correctly.
Transform the output from the lambda like this:
"ExecDateQuery.$": "States.Format('\\'{}\\'', $.Payload.body.exec_date)"
And use ExecDateQuery in the array intrinsic function, instead of ExecDate.
The final State Machine would look like this:
{
"StartAt": "calculate_date",
"States": {
"calculate_date": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:::function:calculate_date:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "get_query",
"ResultSelector": {
"ExecDate.$": "$.Payload.body.exec_date",
"ExecDateQuery.$": "States.Format('\\'{}\\'', $.Payload.body.exec_date)"
}
},
"get_query": {
"Type": "Task",
"Next": "prepare_query",
"Parameters": {
"NamedQueryId": "abc123"
},
"Resource": "arn:aws:states:::aws-sdk:athena:getNamedQuery",
"ResultPath": "$.Payload"
},
"prepare_query": {
"Type": "Task",
"Next": "execute_query",
"Parameters": {
"QueryStatement.$": "$.Payload.NamedQuery.QueryString",
"StatementName": "PreparedStatementName",
"WorkGroup": "athena-workgroup"
},
"Resource": "arn:aws:states:::aws-sdk:athena:createPreparedStatement",
"ResultPath": "$.Payload"
},
"execute_query": {
"Type": "Task",
"Resource": "arn:aws:states:::athena:startQueryExecution",
"Parameters": {
"ExecutionParameters.$": "States.Array($.ExecDateQuery, $.ExecDateQuery, $.ExecDateQuery)",
"QueryExecutionContext": {
"Catalog": "catalog_name",
"Database": "database_name"
},
"QueryString": "EXECUTE PreparedStatementName",
"WorkGroup": "athena-workgroup",
"ResultConfiguration": {
"OutputLocation": "s3://bucket"
}
},
"End": true
}
}
}

Related

How to create a step function that iterates over a list of strings

I'm trying to create my first AWS step function. I needs to iterated over a list of string and pass each value to a lambda function. I've gotten started but I'm not understanding how to reference the current element in the list to pass was parameter to the lambda function.
{
"StartAt": "Handle Loaders",
"States": {
"Handle Loaders": {
"Type": "Map",
"ItemsPath": "$.InputData"
"Iterator": {
"StartAt": "Execute Loader"
"Execute Loader": {
"Type": "Task"
"Resource": !Ref DataLoader,
"Parameters": {
"SeedData": <same for every iteration>
"Loader": <the current string iterated over>
}
"End": true
}
}
"End": true
}
}
}
I imagine the input would looks something like this
{
"InputData: {
"SeedData": <somedata_values>,
"Loaders": ["Makes", "Models", "Styles"],
}
}
I think this is the definition that you are looking for:
{
"StartAt": "Handle Loaders",
"States": {
"Handle Loaders": {
"Type": "Map",
"ItemsPath": "$.InputData.Loaders",
"Parameters": {
"loader.$": "$$.Map.Item.Value"
},
"Iterator": {
"StartAt": "Execute Loader",
"States": {
"Execute Loader": {
"Type": "Task",
"Resource": "<arn of your function>",
"Parameters": {
"SeedData.$": "$$.Execution.Input.InputData.SeedData",
"Loader.$": "$.loader"
},
"End": true
}
}
},
"End": true
}
}
}
Here are some useful link that would help you to understand it:
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html
https://docs.aws.amazon.com/step-functions/latest/dg/concepts-input-output-filtering.html
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-map-state.html
Also, given that you are starting with step functions I'd recommend you 2 useful tools in the console:
Data Flow Simulator
Workflow Studio
You're missing a part of the definition for moving into the tasks inside the Map:
Iterator": {
"StartAt": "Prepare Test Data",
"States": {
"Prepare Test Data": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.TestData",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:your:function"
},
"Next": "Call_Test_System
},
"Call_Test_System::{
"Type": "Task",
... ect
specifically note that after StartAt key, there is a key States: that is the mini definition of tasks/states within a map iterator.
You can then use the OutputPath defined (in this case as the base of the json $ under the key TestData
in an InputPath for the next step in the same format, which will pass whatever the first state outputs into the next.
The Iterator also has an OutputPath, where it will place a list of all the responses of the iterations within the map.

Pass Step Function variable to AWS Glue Job Not Working

I'm trying to pass an AWS Step Function variable to a Glue Job parameter, similar to this:
aws-passing-job-parameters-value-to-glue-job-from-step-function
However, this is not working for me. The glue job error message indicates that it's getting the passed variable name--not the actual value of the variable. Here's my Step Function code:
{
"Comment": "Converts CSV files to parquet for a date range.",
"StartAt": "ConfigureCount",
"States": {
"ConfigureCount": {
"Type": "Pass",
"Result": {
"start": 201601,
"end": 201602,
"index": 201601
},
"ResultPath": "$.iterator",
"Next": "Iterator"
},
"Iterator": {
"Type": "Task",
"Resource": "arn:aws:lambda:eu-west-1:123456789:function:date-iterator",
"ResultPath": "$.iterator",
"Next": "IsCountReached"
},
"IsCountReached": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.iterator.continue",
"BooleanEquals": true,
"Next": "ConvertToParquet"
}
],
"OutputPath": "$.iterator",
"Default": "Done"
},
"ConvertToParquet": {
"Comment": "Your application logic, to run a specific number of times",
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "convert-to-parquet",
"Arguments": {
"--DATE_RANGE": "$.iterator.index"
}
},
"ResultPath": "$.iterator.index",
"Next": "Iterator"
},
"Done": {
"Type": "Pass",
"End": true
}
}
}
The step "Iterator"step is calling a Lambda called "date-iterator" which returns JSON similar to the following:
{
"start": "201601",
"end": "201602",
"index": "201601"
}
This was based on this article, so that I can loop through values: Iterating a Loop Using Lambda
My Step Function fails, saying "$.iterator.index" is not a valid date.
How do I pass this value, and not the variable name?
from Amazon States Language (https://states-language.net/spec.html):
If any field within the Payload Template (however deeply nested) has a name ending with the characters ".$", its value is transformed according to rules below and the field is renamed to strip the ".$" suffix.
Based on that adding .$ should solve your issue:
"Parameters": {
"JobName": "convert-to-parquet",
"Arguments": {
"--DATE_RANGE.$": "$.iterator.index"
}
},

Step Functions ignoring Parameters, executing it with default input

I don't know if this is a bug of my own or Amazon's as I'm unable to properly understand what's going on.
I have the following Step Function (It's currently a work in progress), which takes care of validating an input and then sending it to an SQS lambda (CreateInAuth0 Step).
My problem is that apparently the SQS function is executing with the default parameters, fails, retries and then uses the ones specified in the Step Function. (Or at least that's what I'm gathering from the logs).
I want to know why it is ignoring the Parameters in the first go. Here's all the information I can gather.
There seems to be a difference between TaskStateEntered and TaskScheduled (which I believe it's ok).
{
"Comment": "Add the students to the platform and optionally adds them to their respective grades",
"StartAt": "AddStudents",
"States": {
"AddStudents": {
"Comment": "Validates that the students are OK",
"Type": "Task",
"Resource": "#AddStudents",
"Next": "Parallel"
},
"Parallel": {
"Type": "Parallel",
"Next": "FinishExecution",
"Branches": [
{
"StartAt": "CreateInAuth0",
"States": {
"CreateInAuth0": {
"Type": "Task",
"Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
"Parameters": {
"QueueUrl": "#SQS_StudentsRegistrationSQSFifo",
"MessageBody": {
"execId.$": "$.result.execId",
"tenantType.$": "$.result.tenantType",
"tenantId.$": "$.result.tenantId",
"studentsToCreate.$": "$.result.newStudents.success",
"TaskToken.$": "$$.Task.Token",
"studentsSucceeded": []
}
},
"End": true
}
}
}
]
},
"FinishExecution": {
"Comment": "This signals the ending in DynamoDB that the transaction finished",
"Type": "Task",
"End": true,
"Resource": "arn:aws:states:::dynamodb:putItem",
"ResultPath": "$.dynamoDB",
"Parameters": {
"TableName": "#SchonDB",
"Item": {
"pk": {
"S.$": "$.arguments.tenantId:exec:$.id"
},
"sk": {
"S": "result"
},
"status": {
"S": "FINISHED"
},
"type": {
"S": "#EXEC_TYPE"
}
}
},
"Retry": [
{
"ErrorEquals": [
" States.Timeout"
],
"IntervalSeconds": 1,
"MaxAttempts": 5,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "FinishExecutionIfFinishErrored"
}
]
},
"FinishExecutionIfFinishErrored": {
"Type": "Pass",
"End": true
}
}
}
Here's the visual:
Here's the execution event history:
Note: I have a try/catch statement inside the SQS function that wraps the entire SQS' execution.

AWS step function: chosing a Resource dynamically

I would like to dynamically chose an AWS Lambda worker based on the result coming from a previous step. Something like {"Resource": "$.worker_arn"}.
"RunWorkers": {
"Type": "Map",
"MaxConcurrency": 0,
"InputPath": "$.output",
"ResultPath": "$.raw_result",
"Iterator": {
"StartAt": "CallWorkerLambda",
"States": {
"CallWorkerLambda": {
"Type": "Task",
"Resource": "$$.worker_arn",
"End": true
}
}
},
"Next": "Aggregate"
},
The input from previous step is expected as following:
[{"worker_arn":..., "output":1}, {"worker_arn":..., "output":1}, ...],
where worker_arn is the same among all workers.
When I write a pipeline like this, the linter complains that it expects an ARN.
Are there any options better than wrapping my worker lambda into another lambda?
Using "Resource": "arm:aws:states:::lambda:invoke" you can set the "FunctionName" field in "Parameters" at runtime using a Path.
{
"StartAt":"CallLambda",
"States":{
"CallLambda":{
"Type":"Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters":{
"FunctionName.$":"$.MyFunction",
"Payload.$": "$"
},
"End":true
}
}
}
https://docs.aws.amazon.com/step-functions/latest/dg/connect-lambda.html

Getting started with Step Functions and I can't get Choice to work correctly

I'm doctoring up my first step function, and as a newb into this I am struggling to make this work right. The documentation on AWS is helpful but lacks examples of what I am trying understand. I found a couple similar issues on the site here, but they didn't really answer my question either.
I have a test Step Function that works really simply. I have a small Lambda function that kicks out a single line JSON with a "Count" from a request in a DynamoDB:
def lambda_handler(event, context):
"""lambda_handler
Keyword arguments:
event -- dict -- A dict of parameters to be validated.
context --
Return:
json object with the hubID from DynamoDB of the new hub.
Exceptions:
None
"""
# Prep the Boto3 resources needed
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TransitHubs')
# By default we assume there are no new hubs
newhub = { 'Count' : 0 }
# Query the DynamoDB to see if the name exists or not:
response = table.query(
IndexName='Status-index',
KeyConditionExpression=Key('Status').eq("NEW"),
Limit=1
)
if response['Count']:
newhub['Count'] = response['Count']
return json.dumps(newhub)
A normal output would be:
{ "Count": 1 }
And then I create this Step Function:
{
"StartAt": "Task",
"States": {
"Task": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-west-2:OMGSUPERSECRET:function:LaunchNode-get_new_hubs",
"TimeoutSeconds": 60,
"Next": "Choice"
},
"Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Count",
"NumericEquals": 0,
"Next": "Failed"
},
{
"Variable": "$.Count",
"NumericEquals": 1,
"Next": "Succeed"
}
]
},
"Succeed": {
"Type": "Succeed"
},
"Failed": {
"Type": "Fail"
}
}
}
So I kick off the State Function and I get this output:
TaskStateExited
{
"name": "Task",
"output": {
"Count": 1
}
}
ChoiceStateEntered
{
"name": "Choice",
"input": {
"Count": 1
}
}
ExecutionFailed
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'Choice' (entered at the event id #7). Invalid path '$.Count': The choice state's condition path references an invalid value."
}
So my question: I don't get why this is failing with that error message. Shouldn't Choice just pick up that value from the JSON? Isn't the default "$" input the "input" path?
I have figured out what the issue is, and here it is:
In my Python code, I was attempting to json.dumps(newhub) for the response thinking that what I needed was a string output representing the json formatted response. But it appears that is incorrect. When I alter the code to be simply "return newhub" and return the DICT, the step-functions process accepts that correctly. I'm assuming it parses the DICT to JSON for me? But the output difference is clearly obvious:
old Task output from above returning json.dumps(newhub):
{
"name": "Task",
"output": {
"Count": 1
}
}
new Task output from above returning newhub:
{
"Count": 1
}
And the Choice now correctly matches the Count variable in my output.
In case this is helpful for someone else. I also experienced the kind of error you did ( you had the below... just copy-pasting yours... )
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'Choice' (entered at the event id #7). Invalid path '$.Count': The choice state's condition path references an invalid value."
}
But my problem turned out when I was missing the "Count" key all together.
But I did not want verbose payloads.
But per reading these docs I discovered I can also do...
"Choice": {
"Type": "Choice",
"Choices": [
{
"And": [
{
"Variable": "$.Count",
"IsPresent": true
},
{
"Variable": "$.Count",
"NumericEquals": 0,
}
],
"Next": "Failed"
},
{
"And": [
{
"Variable": "$.Count",
"IsPresent": true
},
{
"Variable": "$.Count",
"NumericEquals": 1,
}
],
"Next": "Succeed"
}
]
},
even I was facing the same issue. Give the ResultPath as InputPath and give the choice value as .value name
{
"Comment": "Step function to execute lengthy synchronous requests",
"StartAt": "StartProcessing",
"States": {
"StartProcessing": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:703569030910:function:shravanthDemo-shravanthGetItems-GHF7ZA1p6auQ",
"InputPath": "$.lambda",
"ResultPath": "$.lambda",
"Next": "VerifyProcessor"
},
"VerifyProcessor": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.lambda.cursor",
"IsNull": false,
"Next": "StartProcessing"
},
{
"Variable": "$.lambda.cursor",
"IsNull": true,
"Next": "EndOfUpdate"
}
]
},
"EndOfUpdate": {
"Type": "Pass",
"Result": "ProcessingComplete",
"End": true
}
}
}

Resources