Failed to run the pipeline (Pipeline) - Azure Data Factory - debugging

For testing purposes I'm trying to execute this simple pipeline (nothing sophisticated).
However, I'm getting this error:
{"code":"BadRequest","message":null,"target":"pipeline//runid/cb841f14-6fdd-43aa-a9c1-4619dab28cdd","details":null,"error":null}
The goal is to see if two variables are getting the right values (we have been facing some issues in our production environment).
This is the json with the definition of the pipeline:
{
"name": "GeneralTest",
"properties": {
"activities": [
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "start_time",
"value": {
"value": "#utcnow()",
"type": "Expression"
}
}
},
{
"name": "Wait1",
"type": "Wait",
"dependsOn": [
{
"activity": "Set variable1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"waitTimeInSeconds": 5
}
},
{
"name": "Set variable2",
"description": "",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Wait1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "end_time ",
"value": {
"value": "#utcnow()",
"type": "Expression"
}
}
}
],
"variables": {
"start_time": {
"type": "String"
},
"end_time ": {
"type": "String"
}
},
"folder": {
"name": "Old Pipelines"
},
"annotations": []
}
}
What am I missing, or what could be the problem with this process?

You are having a "blank space" after the variable name end_time like "end_time "
You can see the difference in my repro:
MyCode VS YourCode
Clearing that would make the execution just fine.

I faced a similar issue when doing a Debug run of one of my pipelines. The error messages for these types of errors are not helpful when running in Debug mode.
What I have found is that if you publish the pipeline and then Trigger a Pipeline Run (instead of a Debug run), you can then go to Monitor Pipeline Runs and it will show you a more useful error message.

Apart from possible blank spaces in variable or parameter names, Data Factory doesn't like hyphens, but only in parameter names, variables are fine.
Validation passes, but then in debug time you get the same cryptic error

I ran into this same error message in Data Factory today on the Copy activity. Everything passed validation but this error would pop on each debug run.
I have parameters configured on my dataset connections so that I can use dynamic queries against the data sources. In this case, I was using explicit queries, so the parameters appeared irrelevant. I tried with both blank values and value is null. Both failed the same way.
I tried with stupid but real text values and it worked! The pipeline isn't leveraging the stupid values for any work, so their content doesn't matter, but some portion of the engine needs a non-null value in the parameters in order to execute.

Related

How to ensure a Step Function executes Parameterized Query properly in AWS?

I'm currently trying to execute an Athena Query during a State Machine. The query itself needs a date variable to use in several WHERE statements so I'm using a Lambda to generate it.
When I run EXECUTE prepared-statement USING 'date', 'date', 'date'; directly in Athena, I get the results I expect so I know the query is formed correctly, but when I try to do it in the state machine, it gives me the following error:
SYNTAX_ERROR: line 19:37: Unexpected parameters (integer) for function date. Expected: date(varchar(x)) , date(timestamp) , date(timestamp with time zone)
So my best guess is that I'm somehow not passing the execution parameters correctly.
The Lambda that calculates the date returns it in a string with the format %Y-%m-%d, and in the State Machine I make sure to pass it to the output of every State that needs it. Then I get a named query to create a prepare statement from within the state machine. I then use that prepared statement to run an EXECUTE query that requires the date multiple times, so I use an intrinsic function to turn it into an array:
{
"StartAt": "calculate_date",
"States": {
"calculate_date": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:::function:calculate_date:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "get_query",
"ResultSelector": {
"ExecDate.$": "$.Payload.body.exec_date"
}
},
"get_query": {
"Type": "Task",
"Next": "prepare_query",
"Parameters": {
"NamedQueryId": "abc123"
},
"Resource": "arn:aws:states:::aws-sdk:athena:getNamedQuery",
"ResultPath": "$.Payload"
},
"prepare_query": {
"Type": "Task",
"Next": "execute_query",
"Parameters": {
"QueryStatement.$": "$.Payload.NamedQuery.QueryString",
"StatementName": "PreparedStatementName",
"WorkGroup": "athena-workgroup"
},
"Resource": "arn:aws:states:::aws-sdk:athena:createPreparedStatement",
"ResultPath": "$.Payload"
},
"execute_query": {
"Type": "Task",
"Resource": "arn:aws:states:::athena:startQueryExecution",
"Parameters": {
"ExecutionParameters.$": "States.Array($.ExecDate, $.ExecDate, $.ExecDate)",
"QueryExecutionContext": {
"Catalog": "catalog_name",
"Database": "database_name"
},
"QueryString": "EXECUTE PreparedStatementName",
"WorkGroup": "athena-workgroup",
"ResultConfiguration": {
"OutputLocation": "s3://bucket"
}
},
"End": true
}
}
}
The execution of the State Machine returns successfully, but the query doesn't export the results to the bucket, and when I click on the "Athena query execution" link in the list of events, it takes me to the Athena editor page where I see the error listed above
https://i.stack.imgur.com/pxxOm.png
Am I generating the ExecutionParameters wrong? Does the createPreparedStatement resource need a different syntax for the query parameters? I'm truly at a lost here, so any help is greatly appreciated
I just solved my problem. And I'm posting this answer in case anyone comes across the same issue.
Apparently, the ExecutionParameters paremeter in an Athena StartQueryExecution state does not respect the variable type of a JSONPath variable, so you need to manually add the single quotes when forming your array. I solved this by adding a secondary output from the lambda, with the date wrapped in single quotes so that when I make the array using instrinsic functions and pass it to the query execution, it forms the query string correctly.
Transform the output from the lambda like this:
"ExecDateQuery.$": "States.Format('\\'{}\\'', $.Payload.body.exec_date)"
And use ExecDateQuery in the array intrinsic function, instead of ExecDate.
The final State Machine would look like this:
{
"StartAt": "calculate_date",
"States": {
"calculate_date": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:::function:calculate_date:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException",
"Lambda.TooManyRequestsException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "get_query",
"ResultSelector": {
"ExecDate.$": "$.Payload.body.exec_date",
"ExecDateQuery.$": "States.Format('\\'{}\\'', $.Payload.body.exec_date)"
}
},
"get_query": {
"Type": "Task",
"Next": "prepare_query",
"Parameters": {
"NamedQueryId": "abc123"
},
"Resource": "arn:aws:states:::aws-sdk:athena:getNamedQuery",
"ResultPath": "$.Payload"
},
"prepare_query": {
"Type": "Task",
"Next": "execute_query",
"Parameters": {
"QueryStatement.$": "$.Payload.NamedQuery.QueryString",
"StatementName": "PreparedStatementName",
"WorkGroup": "athena-workgroup"
},
"Resource": "arn:aws:states:::aws-sdk:athena:createPreparedStatement",
"ResultPath": "$.Payload"
},
"execute_query": {
"Type": "Task",
"Resource": "arn:aws:states:::athena:startQueryExecution",
"Parameters": {
"ExecutionParameters.$": "States.Array($.ExecDateQuery, $.ExecDateQuery, $.ExecDateQuery)",
"QueryExecutionContext": {
"Catalog": "catalog_name",
"Database": "database_name"
},
"QueryString": "EXECUTE PreparedStatementName",
"WorkGroup": "athena-workgroup",
"ResultConfiguration": {
"OutputLocation": "s3://bucket"
}
},
"End": true
}
}
}

Merge profile based on 2 property in Apache-Unomi

I am trying to build an customize logic in action for profile merging, can anybody suggest me how to create a rule where I can merge profile based on email and phone-number, as of now I am able to do with only one property value email. you can find the sample rule below in code :
"metadata": {
"id": "exampleLogin",
"name": "Example Login",
"description": "Copy event properties to profile properties on login"
},
"condition": {
"parameterValues": {
"subConditions": [
{
"type": "eventTypeCondition",
"parameterValues": {
"eventTypeId": "click"
}
}
],
"operator": "and"
},
"type": "booleanCondition"
},
"actions": [
{
"parameterValues": {
"mergeProfilePropertyValue": "eventProperty::target.properties(email)",
"mergeProfilePropertyName": "mergeIdentifier"
},
"type": "mergeProfilesOnPropertyAction"
},
{
"parameterValues": {
},
"type": "allEventToProfilePropertiesAction"
}
]
}
In order to be able to merge based on multiple identifiers you would have to extend the default built-in action to support that.
This can be done by creating a module but it will require some Java knowledge since this is how Unomi is implemented.
The code for the default merge action is available here:
https://github.com/apache/unomi/blob/master/plugins/baseplugin/src/main/java/org/apache/unomi/plugins/baseplugin/actions/MergeProfilesOnPropertyAction.java

Getting started with Step Functions and I can't get Choice to work correctly

I'm doctoring up my first step function, and as a newb into this I am struggling to make this work right. The documentation on AWS is helpful but lacks examples of what I am trying understand. I found a couple similar issues on the site here, but they didn't really answer my question either.
I have a test Step Function that works really simply. I have a small Lambda function that kicks out a single line JSON with a "Count" from a request in a DynamoDB:
def lambda_handler(event, context):
"""lambda_handler
Keyword arguments:
event -- dict -- A dict of parameters to be validated.
context --
Return:
json object with the hubID from DynamoDB of the new hub.
Exceptions:
None
"""
# Prep the Boto3 resources needed
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TransitHubs')
# By default we assume there are no new hubs
newhub = { 'Count' : 0 }
# Query the DynamoDB to see if the name exists or not:
response = table.query(
IndexName='Status-index',
KeyConditionExpression=Key('Status').eq("NEW"),
Limit=1
)
if response['Count']:
newhub['Count'] = response['Count']
return json.dumps(newhub)
A normal output would be:
{ "Count": 1 }
And then I create this Step Function:
{
"StartAt": "Task",
"States": {
"Task": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-west-2:OMGSUPERSECRET:function:LaunchNode-get_new_hubs",
"TimeoutSeconds": 60,
"Next": "Choice"
},
"Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Count",
"NumericEquals": 0,
"Next": "Failed"
},
{
"Variable": "$.Count",
"NumericEquals": 1,
"Next": "Succeed"
}
]
},
"Succeed": {
"Type": "Succeed"
},
"Failed": {
"Type": "Fail"
}
}
}
So I kick off the State Function and I get this output:
TaskStateExited
{
"name": "Task",
"output": {
"Count": 1
}
}
ChoiceStateEntered
{
"name": "Choice",
"input": {
"Count": 1
}
}
ExecutionFailed
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'Choice' (entered at the event id #7). Invalid path '$.Count': The choice state's condition path references an invalid value."
}
So my question: I don't get why this is failing with that error message. Shouldn't Choice just pick up that value from the JSON? Isn't the default "$" input the "input" path?
I have figured out what the issue is, and here it is:
In my Python code, I was attempting to json.dumps(newhub) for the response thinking that what I needed was a string output representing the json formatted response. But it appears that is incorrect. When I alter the code to be simply "return newhub" and return the DICT, the step-functions process accepts that correctly. I'm assuming it parses the DICT to JSON for me? But the output difference is clearly obvious:
old Task output from above returning json.dumps(newhub):
{
"name": "Task",
"output": {
"Count": 1
}
}
new Task output from above returning newhub:
{
"Count": 1
}
And the Choice now correctly matches the Count variable in my output.
In case this is helpful for someone else. I also experienced the kind of error you did ( you had the below... just copy-pasting yours... )
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'Choice' (entered at the event id #7). Invalid path '$.Count': The choice state's condition path references an invalid value."
}
But my problem turned out when I was missing the "Count" key all together.
But I did not want verbose payloads.
But per reading these docs I discovered I can also do...
"Choice": {
"Type": "Choice",
"Choices": [
{
"And": [
{
"Variable": "$.Count",
"IsPresent": true
},
{
"Variable": "$.Count",
"NumericEquals": 0,
}
],
"Next": "Failed"
},
{
"And": [
{
"Variable": "$.Count",
"IsPresent": true
},
{
"Variable": "$.Count",
"NumericEquals": 1,
}
],
"Next": "Succeed"
}
]
},
even I was facing the same issue. Give the ResultPath as InputPath and give the choice value as .value name
{
"Comment": "Step function to execute lengthy synchronous requests",
"StartAt": "StartProcessing",
"States": {
"StartProcessing": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:703569030910:function:shravanthDemo-shravanthGetItems-GHF7ZA1p6auQ",
"InputPath": "$.lambda",
"ResultPath": "$.lambda",
"Next": "VerifyProcessor"
},
"VerifyProcessor": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.lambda.cursor",
"IsNull": false,
"Next": "StartProcessing"
},
{
"Variable": "$.lambda.cursor",
"IsNull": true,
"Next": "EndOfUpdate"
}
]
},
"EndOfUpdate": {
"Type": "Pass",
"Result": "ProcessingComplete",
"End": true
}
}
}

Jira Trigger plugin for Jenkins: Obtaining Value from issue object

I'm trying to set up some automation using Jenkins and Jira Trigger plugin for Jenkins. I've set up a web hook at Jira side which is able to invoke build on Jenkins as desired.
I've a trouble with obtaining value for a custom field from incoming Jira request.
I'm trying to use "Issue Attribute Path" feature and I've also referred to post (How to use the 'Issue attribute path' in the parameter mapping of jenkins-trigger-plugin)
Howevevr I'm still unable to get value for custom fields customfield_10010, customfield_10011. I've tried with mapping like fields.customfield_10010, fields.customfield_10010.value, fields.customfield_10010.0.value,customfield_10010.0.value and similar combinations. I'm able to get value for other standard fields as suggested in plugin help. ex: status.name, description etc.
I could not get any clue from Jira documentation site either.
Parts of incoming json data is below for easy reference.
"issue": {
"id": "1000x",
"self": "http://localhost:3080/rest/api/2/issue/10007",
"key": "ABC-2",
"fields": {
"issuetype": {
..
},
"parent": {
..
},
"components": [
],
"timespent": null,
"timeoriginalestimate": 28800,
"description": ".....",
"project": {
..
},
"customfield_10010": [
{
"self": "http://localhost:3080/rest/api/2/customFieldOption/10019",
"value": "ABC-Custom 1",
"id": "10019"
}
],
"fixVersions": [
],
"customfield_10011": [
{
"self": "http://localhost:3080/rest/api/2/customFieldOption/10021",
"value": "ABC-Custom 2",
"id": "10021"
}
],
.....
....
....
}
}
You can get the value of a custom field with the following syntax:
fields.find { it.id == "customfield_10010" }.value
I had the same problem and found this solution here:
https://issues.jenkins-ci.org/browse/JENKINS-13216

Automating Hive Activity using aws

I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving this input and output in hive-script that's where the problem starts because a hive-activity has to have input and output but i have to give them in script file.
I am trying to find a way to automate this hive-script and waiting for some ideas ?
Cheers,
You can disable staging on Hive Activity to run any arbitrary Hive Script.
stage = false
Do something like:
{
"name": "DefaultActivity1",
"id": "ActivityId_1",
"type": "HiveActivity",
"stage": "false",
"scriptUri": "s3://baucket/query.hql",
"scriptVariable": [
"param1=value1",
"param2=value2"
],
"schedule": {
"ref": "ScheduleId_l"
},
"runsOn": {
"ref": "EmrClusterId_1"
}
},
Another alternative to the Hive Activity, is to use an EMR activity as in the following example:
{
"schedule": {
"ref": "DefaultSchedule"
},
"name": "EMR Activity name",
"step": "command-runner.jar,hive-script,--run-hive-script,--args,-f,s3://bucket/path/query.hql",
"runsOn": {
"ref": "EmrClusterId"
},
"id": "EmrActivityId",
"type": "EmrActivity"
}

Resources