NIFI: Unable to extract values from JSON - apache-nifi

Initially i am querying a table to get two values id,payload. The payload field is again a json but in a string. please check below payload string.
{
"schema": "http://schemas.viacom.com/what/is/the/path#",
"op": "delete",
"entity": "movie",
"entity_identifier": {
"series_code": 53709,
"episode_code": 1
},
"entity_vmid": "",
"short_name": "",
"title": ""
}
I want series_code and episode_code values. I Tried in below ways but of no use
ExecuteSQL --> ConvertAvrotoJSON --> EvaluateJSON($.payload.entity_identifier.series_code)
ExecuteSQL --> ConvertAvrotoJSON --> AttributestoJSON --> EvaluateJSON
Please help.

You can use the EvaluateJsonPath processor to evaluate JsonPath expressions against the content of the flowfile. You add one user-defined property per value you want to extract. Set the Destination value to flowfile-attribute to extract the value into an attribute which will be added to the flowfile, or flowfile-content to generate a new flowfile with the extracted value as the sole content.
Given the JSON you provided, the two path expressions you would use are:
$.entity_identifier.series_code -> 53709
$.entity_identifier.episode_code -> 1

Related

Transform date format inside CSV using Apache Nifi

I need to modify CSV file in Apache Nifi environment.
My CSV looks like file:
Advertiser ID,Campaign Start Date,Campaign End Date,Campaign Name
10730729,1/29/2020 3:00:00 AM,2/20/2020 3:00:00 AM,Nestle
40376079,2/1/2020 3:00:00 AM,4/1/2020 3:00:00 AM,Heinz
...
I want to transform dates with AM/PM values to simple date format. From 1/29/2020 3:00:00 AM to 2020-01-29 for each row. I read about UpdateRecord processor, but there is a problem. As you can see, CSV headers contain spaces and I can't even parse these fields with both Replacement Value Strategy (Literal and Record Path).
Any ideas to solve this problem? Maybe somehow I should modify headers from Advertiser ID to advertiser_id, etc?
You don't need to actually make the transformation yourself, you can let your Readers and Writers handle it for you. To get the CSV Reader to recognize dates though, you will need to define a schema for your rows. Your schema would look something like this (I've removed the spaces from the column names because they are not allowed):
{
"type": "record",
"name": "ExampleCSV",
"namespace": "Stackoverflow",
"fields": [
{"name": "AdvertiserID", "type": "string"},
{"name": "CampaignStartDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignEndDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignName", "type": "string"}
]
}
To configure the reader, set the following properties:
Schema Access Strategy = Use 'Schema Text' property
Schema Text = (Above codeblock)
Treat First Line as Header = True
Timestamp Format = "MM/dd/yyyy hh:mm:ss a"
Additionally you can set this property to ignore the Header of the CSV if you don't want to or are unable to change the upstream system to remove the spaces.
Ignore CSD Header Column Names = True
Then in your CSVRecordSetWriter service you can specify the following:
Schema Access Strategy = Inherit Record Schema
Timestamp Format = "yyyy-MM-dd"
You can use UpdateRecord or ConvertRecord (or others as long as they allow you to specify both a reader and a writer)and it will just do the conversion for you. The difference between UpdateRecord and ConvertRecord is that UpdateRecord requires you to specify a user defined property, so if this is the only change you will make, just use ConvertRecord. If you have other transformations, you should use UpdateRecord and make those changes at the same time.
Caveat: This will rewrite the file using the new column names (in my example, ones without spaces) so keep that in mind for downstream usage.

Nifi: Can I do mathematical operations on a json file values?

I have a JSON file as an input to a processor. Something like this:
{"x" : 10, "y" : 5}
Can I do mathematical operations on these values instead of writing a custom processor? I need to do something like
( x / y ) * 3
^ Just an example.
I need to save the result to an output file.
UPDATE:
This is my text in generateFlowFile processor:
X|Y
1|123
2|111
And this is my AVRO schema:
{
"name": "myschema",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "X" , "type": "int"},
{"name": "Y" , "type": "int"} ]
}
When I change the above types to string, it works fine but I cannot perform math operations on a string.
FYI, I have selected 'Use Schema Name Property' in Schema Access Strategy
Use QueryRecord processor.
Configure/enable Record Reader/Writer controller services
Define Avro schema to read the incoming Json.
Define Avro Schema to write the results of query in desired format.
Add new property in the query record processor as
sql
select ( x / y ) * 3 as div from FLOWFILE
The output flowfile from the query record processor will be in the configured Record Writer format.

Nifi - atttributes to json - not able to generate the required json from an attribute

The flowfile content is
{
"resourceType": "Patient",
"myArray": [1, 2, 3, 4]
}
I use EvaluateJsonPath processor to load the "myArray" to an attrribute myArray.
Then I use the processor AttributesToJSON to create a json from myArray.
But in the flowfile content, what I get is
{"myArray":"[1,2,3,4]"}
I expected the flowfile to have the following content.
{"myArray":[1,2,3,4]}
Here are the flowfile attributes
How can I get "myArray" as an array again in the content?
Use record oriented processors like Convert Record processor instead of using EvaluateJsonPath,AttributesToJSON processors.
RecordReader as JsonPathReader
JsonPathReader Configs:
AvroSchemaRegistry:
{
"namespace": "nifi",
"name": "person",
"type": "record",
"fields": [
{ "name": "myArray", "type": {
"type": "array",
"items": "int"
}}
]
}
JsonSetWriter:
Use the same AvroSchemaRegistry controller service to access the schema.
To access the AvroSchema you need to set up schema.name attribute to the flowfile.
Output flowfile content would be
[{"myArray":[1,2,3,4]}]
please refer to this link how to configure ConvertRecord processor
(or)
if your deserved output is {"myArray":[1,2,3,4]} without [](array) then use
ReplaceText processor instead of AttributesToJson Processor.
ReplaceText Configs:
Not all credit goes to me but I was pointed to a better simpler way to achieve this. There are 2 ways.
Solution 1 - and the simplest and elegant
Use Nifi JoltTransformJSON Processor. The processor can make use of Nifi expression language and attributes in both left or right hand side of the specification syntax. This allows you to quickly use the JOLT default spec to add new fields (from flow-file attributes) to a new or existing JSON.
Ex:
{"customer_id": 1234567, "vckey_list": ["test value"]}
both of those fields values are stored in flow-file attributes as a result of a EvaluateJSONPath operation. Assume "customer_id_attr" and ""vckey_list_attr". We can simply generate a new JSON from those flow-file attributes with the "default" jolt spec and the right hand syntax. You can even add addition expression language functions to the processing
[
{
"operation": "default",
"spec": {
"customer_id": ${customer_id_attr},
"vckey_list": ${vckey_list_attr:toLower()}
}
}
]
This worked for me even when storing the entire JSON, path of "$", in a flow-file attribute.
Solution 2 - complicated and uglier
Use a sequence Nifi ReplaceText Processor. First use a ReplaceText processor to append the desired flow-file attribute to the file-content.
replace_text_processor_1
If you are generating a totally new JSON, this would do it. If you are trying to modify an existing one, you would need to first append the desired keys, than use ReplaceText again to properly format as a new key in the existing JSON, from
{"original_json_key": original_json_obj}{"customer_id": 1234567, "vckey_list": ["test value"]}
to
{"original_json_key": original_json_obj, "customer_id": 1234567, "vckey_list": ["test value"]}
using
replace_text_processor_2
Then use JOLT to do further processing (that's why Sol 1 always makes sense)
Hope this helps, spent about half a day figuring out the 2nd Solution and was pointed to Solution 1 by someone with more experience in Nifi

Delete empty attributes in NiFi

Because of this issue being still unresolved, I have an EvaluateJsonPath processor that sometimes outputs attributes with empty strings.
Is there a straight-forward way to delete attributes from a flowfile?
I tried using the UpdateAttributes processor, but it only is able to delete based on matching an attribute's name (I need to match on the attribute's value).
you can use ExecuteGroovyScript 1.5.0 processor with the following code:
def ff=session.get()
if(!ff)return
def emptyKeys = ff.getAttributes().findAll{it.value==null || it.value==''}.collect{it.key}
ff.removeAllAttributes(emptyKeys)
REL_SUCCESS<<ff
After EvaluateJsonPath processor use RouteonAttribute processor and check the attributes having isEmpty values in them using Expression Language
Routeonattribute configs:-
Add new property
emptyattribute
${anyAttribute("id","age"):isEmpty()}
by using or funtion
${id:isEmpty():or(${age:isEmpty()})}
in the above expression language we are checking any id, age attribute having empty values for them and routing them to emptyattribute relation.
${allAttributes("id","age"):isEmpty()}
By using and function
${id:isEmpty():and(${age:isEmpty()})}
this expression routes only when both id,age attributes are empty.
Use Empty Relationship and connect that to Update Attribute processor and delete the attributes that you want to delete.
UpdateAttributeConfigs:-
in delete attribute expression mentioned id,age attributes need to delete.
By using RouteonAttribute after evaljsonpath processor we can check the required attributes are having values or not, then by using updateattribute we can delete the attributes that having empty values.
You can use a jolt transform, but i can only get it to work for fields at the top level of the json. Any nested fields are lost, although perhaps some real jolt expert can improve on the solution to stop that happening.
[
{
"operation": "shift",
"spec": {
"*": {
"": "TRASH",
"*": {
"$": "&2"
}
}
}
},
{
"operation": "remove",
"spec": {
"TRASH": ""
}
}
]
once you validate the required attribute values having empty strings, then make use of UpdateAttribute Advanced Usage and check the required attribute values having empty strings, then change the value to null.For Advance usage of Update attribute refer to this link community.hortonworks.com/questions/141774/… Add Rule: idnull Conditions:- ${id:isEmpty():or(${id:isNull()})} Actions:- id(attribute) null(value) – Shu Feb 3 '18 at 3:08
The approach does not remove attribute but just set the attribute value as null.

Update Processor returns empty String: Converting string date to long in NIFI

Converting a string date with the format mentioned on the image to a number (long) but the output I get is empty string.
Using a JSON reader and writer;
where in input JSON it is a string and in the output JSON it is of type long.
Tried to keep the output JSON type as a String and tried to evaluate the following expression but that was also empty string
${DATE1.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber():toString()}
Sample data trying to convert: {"DATE1" : "2018-01-17 00:00:00"}
Tried to follow the solution on this link but still getting empty string.
Method 1: Referring to contents of flowfile:-
If you want to change the DATE1 value based on the field value from the content then you need to refer as field.value
Replacement Value Strategy
Literal Value
//DATE1
${field.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}
Referring DATE1 value from the content, then apply expression language to it.
Avro Schema Registry:-
{ "namespace": "nifi", "name": "balances", "type": "record",
"fields": [
{ "name": "DATE1", "type": "string"} ] }
Read DATE1 field value as String from the content.
JsonRecordSetWriter:-
{ "namespace": "nifi", "name": "balances", "type": "record",
"fields": [
{ "name": "DATE1", "type":"long"} ] }
In SetWriter configure DATE1 as Long type.
Input:-
{"DATE1":"2018-01-17 00:00:00"}
Output:-
[{"DATE1":1516165200000}]
(or)
Method 2: Referring to attribute of the flowfile:-
if you are having DATE1 as attribute of the flowfile with value 2018-01-17 00:00:00 we are going to use DATE1 attribute instead of field.value(which refers to contents of flowfile)
Then UpdateRecord Configs would be
Replacement Value Strategy
Literal Value
//DATE1
${DATE1:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}
in this Expression we are using DATE1 attribute to Update the contents of flowfile.
Both methods will result the same output.
Check value before converts to Date by using isEmpty() and ifElse().
${field.value:isEmpty():ifElse('', ${field.value:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()})}

Resources