AWS step functions - Nested Map type - aws-lambda

I am new to AWS step functions and I'm trying to nest multiple Map type Lambda tasks. I am experiencing unexpected behaviour with my approach though. get_item_list outputs a list of items to be fed into the get_item_pages Map. The get_item_pages Map, iterates over get_info. I then want to pass each output of get_info to another Map type, get_more_info. The reason I am using Map functions is so that I can utilise the fan out functionality. The first Map type, get_item_pages, fans out as expected. In the Lambda CloudWatch logs, I can see 10+ log streams start. The get_more_info state does not appear to be fanning out in the same fashion though. I often only see a single log stream in the Lambda CloudWatch for the nested Map state.
Am I missing something obvious in my implementation or am I going about this in an entirely wrong way?
{
"Comment": "A nested map example",
"StartAt": "get_item_list",
"States": {
"get_item_list": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"Next": "get_item_pages",
"ResultPath": "$.data"
},
"get_item_pages": {
"Type": "Map",
"ItemsPath": "$.data.all_items",
"MaxConcurrency": 100,
"Iterator": {
"StartAt": "get_info",
"States": {
"get_info": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"ResultPath": "$.data",
"Next": "get_more_info"
},
"get_more_info": {
"Type": "Map",
"ItemsPath": "$.data.all_data",
"MaxConcurrency": 100,
"Iterator": {
"StartAt": "get_detailed_info",
"States": {
"get_detailed_info": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"End": true
}
}
},
"End": true
}
}
},
"End": true
}
}
}

This is an old question, but thought I'd answer for anyone else. I can't say for certain, but it seems likely that you're experiencing that because the "End": true (meant to indicate completion of the map) is supposed to be at the Task level, not the Map level. Putting "End": true at the map level indicates the end of the state machine.
The exact outcome is probably indeterminant because AWS doesn't tell you EXACTLY how the iterations will flow through the map, only that it will be done so in accordance with the concurrency you set, so 100. Even so, all iterations (up to 100) are not guaranteed to move through the map concurrently.
So, what's likely happening is the first iteration is flowing through the get_more_info (nested) map and then triggering "End": true, thus ending the entire state machine.

Related

AWS Stepfunction pass data to next lambda without all the extra padding

I have created a state machine with AWD CDK (typescript) and it all works fine. It is just the output of Lambda 1 which is the input for Lambda 2, has some sort of state machine padding which I am not interested in.
Definition of state machine:
{
"StartAt": "",
"States": {
"...applicationPdf": {
"Next": "...setApplicationProcessed",
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "...applicationPdf",
"Payload.$": "$"
}
},
"...setApplicationProcessed": {
"Next": "Done",
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"...applicationPdf",
"Payload.$": "$"
}
},
"Done": {
"Type": "Succeed"
}
}
}
Output of Lambda1 (applicationPdf):
{
"ExecutedVersion": "$LATEST",
"Payload": {
...
},
"SdkHttpMetadata": {
"AllHttpHeaders": {
...
},
"HttpHeaders": {
....
},
"HttpStatusCode": 200
},
"SdkResponseMetadata": {
....
},
"StatusCode": 200
}
So I am only interested in Payload, not all the other stuff.
The reason I want to do is that is I want to run the 2nd lambda separately I just want the Event going into the Lambda, to be the Payload object, not the the object with ExecutedVersion etc.
Does anyone know how to do this?
I will have a look at the Parameters option of the definition, maybe the answer lies there.
Thanks for your question and for your interest in Step Functions.
The ResultSelector and OutputPath fields can be used to manipulate the output of a state, which can be particularly helpful when a state outputs values which you do not need access to in subsequent states. The difference between them is that ResultSelector is applied before the state's ResultPath is applied, while OutputPath is applied after it.
As you noted, you can use OutputPath to filter out any unwanted metadata before being passed on to the next state.
I found one solution, add the outputPath:
return new LambdaInvoke(this, 'lamba', {
lambdaFunction: Function.fromFunctionArn(this, name, this.createLabmdaArn('applicationPdf')),
outputPath: '$.Payload',
});
This seems to work and might be THE solution.

ElasticSearch don't return "deletebyquery" task when I query list of tasks

I'm using ElasticSearch 7.1.1 and use delete_by_query API to remove required documents asynchronously. As result that query return to me task id. When I execute next query:
GET {elasticsearch_cluster_url}/_tasks/{taskId}
I'm able to get correspond document for the task. But when I try to execute such query:
GET {elasticsearch_cluster_url}/_tasks?detailed=true&actions=*/delete/byquery
I don't see my task in the result list.
I also tried to execute such query:
GET {elasticsearch_cluster_url}/_tasks
And I got such response:
{
"nodes": {
"hWZQF7w_Rs6_ZRDZkGpSwQ": {
"name": "0d5027c3ae6f0bb0105dcdb04470af43",
"roles": [
"master"
,
"data"
,
"ingest"
],
"tasks": {
"hWZQF7w_Rs6_ZRDZkGpSwQ:194259297": {
"node": "hWZQF7w_Rs6_ZRDZkGpSwQ",
"id": 194259297,
"type": "transport",
"action": "cluster:monitor/tasks/lists",
"start_time_in_millis": 1596710876682,
"running_time_in_nanos": 214507,
"cancellable": false,
"headers": { }
},
"hWZQF7w_Rs6_ZRDZkGpSwQ:194259298": {
"node": "hWZQF7w_Rs6_ZRDZkGpSwQ",
"id": 194259298,
"type": "direct",
"action": "cluster:monitor/tasks/lists[n]",
"start_time_in_millis": 1596710876682,
"running_time_in_nanos": 84696,
"cancellable": false,
"parent_task_id": "hWZQF7w_Rs6_ZRDZkGlDwQ:194259297",
"headers": { }
}
}
}
}
}
I'm not sure, maybe that query returns the tasks only from one cluster's node that accepted the query? Why my task is missed in the last query? Is there some mistake in my query parameters?
If your delete by query call is very short-lived, you're not going to see it in _tasks after the query has run.
If you want to keep a trace of your calls, you need to add ?wait_for_completion=false to your call.
While the query is running, you're going to see it using GET _tasks, however, when the query has finished running, you won't see it anymore using GET _tasks, but GET .tasks/_search instead.
After it's done running and you have specified wait_for_completion=false, you can see the details of your finished task with GET .tasks/task/<task-id>

jmeter json path Conditional Extraction at peer level

I'm using jmeter v2.13 and jp#gc - JSON Path Extractor.
Here is my JSON sample:
{
"views": [{
"id": 9701,
"name": " EBS: EAS: IDC (EAS MBT IDC)",
"canEdit": true,
"sprintSupportEnabled": true,
"filter": {
"id": 55464,
"name": "Filter for EBS: EAS: IDC & oBill Boar",
"query": "project = \"EBS: EAS: IDC\"",
"owner": {},
"canEdit": false,
"isOrderedByRank": true,
"permissionEntries": [{
"values": [{
"type": "Shared with the public",
"name": ""
}]
}]
},
"boardAdmins": {}
},
{}
]
}
Is it possible to extract views[x].id where there exists an entry views[x].filter.permissionEntries[*].values[*].type that equals Shared with the public?
How would I do it?
Thanks
JSON Query would look like this (I admit I didn't try it in JMeter)
$.views[?(#.filter[?(#.permissionEntries[?(#.values[?(#.type == "Shared with the public")])])])].id
Explanation:
We expect under root ($) to have views and for it to have property id. The rest (in []) are conditions to select only views items based on predefined condition. Hence $.views[conditions].id
Conditions in this case are coming one within the other, but main parts are:
We define condition as a filter ?(...)
We ask filter to look under current item (#) for a specific child item (.child), child may have its own conditions ([...]). Hence #.child[conditions]. That way we move through filter, permissionEntries, values
Finally we get to field values and filter it for a child type field with particular value Shared with the public. Hence #.type == "Shared with the public"
As you see it's not very intuitive, and JSON path is a bit limited. If this is a repetitive issue, and your JSON is even more complicated, you ay consider investing into a scriptable pre-processor (similar to the one explained here.

Can a lambda in an AWS Step Function know the "execution name" of the step function that launched it?

I have this step function that can sometimes fail and I'd like to record this in a (dynamo) DB. What would be handy is if I could just create a new error handling step and that guy would just pick up the "execution name" from somewhere (didn't find it in the context) and record this as a failure.
Is that possible?
AWS Step Functions released recently a feature called context object.
Using $$ notation inside the Parameters block you can access information regarding your execution, including execution name, arn, state machine name, arn and others.
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html
You can create a state to extract the context details that are then accessible to all the other states, such as:
{
"StartAt": "ExtractContextDetails",
"States": {
"ExtractContextDetails": {
"Parameters": {
"arn.$": "$$.Execution.Id"
},
"Type": "Pass",
"ResultPath": "$.contextDetails",
"Next": "NextStateName"
}
}
....
}
Yes, it can, but it is not as straight-forward as you might hope.
You are right to expect that a Lambda should be able to get the name of the calling state machine. Lambdas are passed in a context object that returns information on the caller. However, that object is null when a state machine calls your Lambda. This means two things. You will have to work harder to get what you need, and that this might be implemented in the future.
Right now, the only way I know of achieving this is by starting the execution of the state machine from within another Lambda and passing in the name in the input Json. Here is my code in Java...
String executionName = //generate a unique name...
StartExecutionRequest startExecutionRequest = new StartExecutionRequest()
.withStateMachineArn(stateMachineArn)
.withInput(" {"executionName" : executionName} ") //make sure you escape the quotes
.withName(executionName);
StartExecutionResult startExecutionResult = sf.startExecution(startExecutionRequest);
String executionArn = startExecutionResult.getExecutionArn();
If you do this, you will now have the name of your execution in the input JSON of your first step. If you want to use it in other steps, you should pass it around.
You might also want the ARN of the the execution so you can call state machine methods from within your activities or tasks. You can construct the ARN yourself by using the executionName...
arn:aws:states:us-east-1:acountid:execution:statemachinename:executionName
No. Unless you pass that information in the event, Lambda doesn't know whether or not it's part of a step function. Step functions orchestrate lambdas and maintain state between lambdas.
"States": {
"Get Alter Query": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-northeast-2:1111111:function:test-stepfuction:$LATEST",
"Payload": {
"body.$": "$",
"context.$": "$$"
}
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "Alter S3 Location"
}
}
I solved it by adding context to the payload.
I highly recommend when using step functions to specify some sort of key in the step function configuration. For my step functions I always provide:
"ResultPath": "$",
"Parameters": {
"source": "StepFunction",
"type": "LAMBDA_METHOD_SWITCH_VALUE",
"payload.$": "$"
},
And have each call to lambda use the type field to determine what code to call. When your code fails, wrap it in a try/catch and explicitly use the passed in type which can be the name of the step to determine what to do next.

Can a lambda in an AWS Step Function know the name of the step it is in?

For a lambda executed within a step function, I kind of expected that I could get the name of the current step from the lambda context, but it doesn't seem to be that simple.
Is there any way to get the name of the current step in a lambda that is executed within a Step Function?
UPDATE: as of 05/23/2019 this answer is outdated, since AWS introduced a way to access current step within a step function, see the accepted answer.
Looks like you are right, the current step doesn't get exposed through the context variable.
So, the information that would allow you to identify what stage is the state machine currently in, should be passed from the previous step (i.e. from the previous lambda). This seems to be the most correct option.
Or, as a workaround, you could try inserting pass states before calling your lambda functions to pass an id that could help you to identify the current stage.
Suppose you have two steps in your state machine:
"Step1Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step1test",
"Next": "Step2Test"
},
"Step2Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step2test",
"End": true
}
Here is how you can provide your lambda functions with current step id passed via event.stepId
"Step1TestEnter": {
"Type": "Pass",
"Next": "Step1Test",
"Result": "Step1Test",
"ResultPath": "$.stepId"
},
"Step1Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step1test",
"Next": "Step2TestEnter"
},
"Step2TestEnter": {
"Type": "Pass",
"Next": "Step2Test",
"Result": "Step2Test",
"ResultPath": "$.stepId"
},
"Step2Test": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:xxxxxxxxxx:function:step2test",
"End": true
}
AWS Step Functions released Context Object where you can access information about your execution.
You can use it to send the execution arn to your lambda.
https://docs.aws.amazon.com/step-functions/latest/dg/input-output-contextobject.html
Based on the documentation, I was using Parameters and $$ to pass the step function context object into my lambda function. To test and see if it was working, I thought I could go to the step function console, start a new execution and see the context object being passed into the step function on the "Step Input" tab. To my dismay, it wasn't displayed there. I added some diagnostic logging to the lambda function serializing the input to JSON and logging out to CloudWatch. Cloudwatch logs showed that the context object was being passed in.
Anyway, thought I would post this here to maybe help someone avoid the time I spent trying to figure this one out. It gets passed in, just doesn't show up in the step function console.
I highly recommend when using step functions to specify some sort of key in the step function configuration. For my step functions I always provide:
"ResultPath": "$",
"Parameters": {
"source": "StepFunction",
"type": "LAMBDA_METHOD_SWITCH_VALUE",
"payload.$": "$"
},
And have each call to lambda use the type field to determine what code to call. I have found this to be much easier to implement.

Resources