I have a JSON file as an input to a processor. Something like this:
{"x" : 10, "y" : 5}
Can I do mathematical operations on these values instead of writing a custom processor? I need to do something like
( x / y ) * 3
^ Just an example.
I need to save the result to an output file.
UPDATE:
This is my text in generateFlowFile processor:
X|Y
1|123
2|111
And this is my AVRO schema:
{
"name": "myschema",
"namespace": "nifi",
"type": "record",
"fields": [
{"name": "X" , "type": "int"},
{"name": "Y" , "type": "int"} ]
}
When I change the above types to string, it works fine but I cannot perform math operations on a string.
FYI, I have selected 'Use Schema Name Property' in Schema Access Strategy
Use QueryRecord processor.
Configure/enable Record Reader/Writer controller services
Define Avro schema to read the incoming Json.
Define Avro Schema to write the results of query in desired format.
Add new property in the query record processor as
sql
select ( x / y ) * 3 as div from FLOWFILE
The output flowfile from the query record processor will be in the configured Record Writer format.
Related
I split field with comma separrated field 1 field 2 and concatenate field1 field 2 [3 word first]
example
2022-09-05T00:00:10,677 abc.1 ,
after split and concatenate
2022-09-05T00:00:10:677,abc.1,
You can use UpdateRecord and add a user-defined property something like /field3 set to concat( /field1, /field2 ). You can change /field3 to be whatever you want the output field name to be, and if you want to remove the other fields you can specify a schema in your Record Writer that only has the field(s) you want, such as:
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [{
"name": "field3",
"type": ["string", "null"]
}]
}
I need to modify CSV file in Apache Nifi environment.
My CSV looks like file:
Advertiser ID,Campaign Start Date,Campaign End Date,Campaign Name
10730729,1/29/2020 3:00:00 AM,2/20/2020 3:00:00 AM,Nestle
40376079,2/1/2020 3:00:00 AM,4/1/2020 3:00:00 AM,Heinz
...
I want to transform dates with AM/PM values to simple date format. From 1/29/2020 3:00:00 AM to 2020-01-29 for each row. I read about UpdateRecord processor, but there is a problem. As you can see, CSV headers contain spaces and I can't even parse these fields with both Replacement Value Strategy (Literal and Record Path).
Any ideas to solve this problem? Maybe somehow I should modify headers from Advertiser ID to advertiser_id, etc?
You don't need to actually make the transformation yourself, you can let your Readers and Writers handle it for you. To get the CSV Reader to recognize dates though, you will need to define a schema for your rows. Your schema would look something like this (I've removed the spaces from the column names because they are not allowed):
{
"type": "record",
"name": "ExampleCSV",
"namespace": "Stackoverflow",
"fields": [
{"name": "AdvertiserID", "type": "string"},
{"name": "CampaignStartDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignEndDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
{"name": "CampaignName", "type": "string"}
]
}
To configure the reader, set the following properties:
Schema Access Strategy = Use 'Schema Text' property
Schema Text = (Above codeblock)
Treat First Line as Header = True
Timestamp Format = "MM/dd/yyyy hh:mm:ss a"
Additionally you can set this property to ignore the Header of the CSV if you don't want to or are unable to change the upstream system to remove the spaces.
Ignore CSD Header Column Names = True
Then in your CSVRecordSetWriter service you can specify the following:
Schema Access Strategy = Inherit Record Schema
Timestamp Format = "yyyy-MM-dd"
You can use UpdateRecord or ConvertRecord (or others as long as they allow you to specify both a reader and a writer)and it will just do the conversion for you. The difference between UpdateRecord and ConvertRecord is that UpdateRecord requires you to specify a user defined property, so if this is the only change you will make, just use ConvertRecord. If you have other transformations, you should use UpdateRecord and make those changes at the same time.
Caveat: This will rewrite the file using the new column names (in my example, ones without spaces) so keep that in mind for downstream usage.
I have below sample CSV data coming in multi record format. I want to convert to JSON format like below. I am using Nifi 1.8.
CSV:
id,name,category,status,country
1,XXX,ABC,Active,USA
1,XXX,DEF,Active,HKG
1,XXX,XYZ,Active,USA
Expected JSON:
{
"id":"1",
"status":"Active",
"name":[
"ABC",
"DEF",
"XYZ"
],
"country":[
"USA",
"HKG"
]
}
I tried FetchFile -> ConvertRecord but it is converting every csv record to one JSON object.
Ideal way would be using QueryRecord processor to run Apache calcite SQL query to group by and collect as set to get your desired output.
But i don't know what exactly functions we can use in Apache calcite :(
(or)
You can store the data into HDFS then create a temporary/staging table on top of the hdfs directory.
Use SelectHiveQL processor run the below query:
select to_json(
named_struct(
'id',id,
'status',status,
'category',collect_set(category),
'country',collect_set(country)
)
) as jsn
from <db_name>.<tab_name>
group by id,status
Will result output flowfile as:
+-----------------------------------------------------------------------------------+
|jsn |
+-----------------------------------------------------------------------------------+
|{"id":"1","status":"Active","category":["DEF","ABC","XYZ"],"country":["HKG","USA"]}|
+-----------------------------------------------------------------------------------+
You can Remove header by using csv header to false in case of csv output.
The flowfile content is
{
"resourceType": "Patient",
"myArray": [1, 2, 3, 4]
}
I use EvaluateJsonPath processor to load the "myArray" to an attrribute myArray.
Then I use the processor AttributesToJSON to create a json from myArray.
But in the flowfile content, what I get is
{"myArray":"[1,2,3,4]"}
I expected the flowfile to have the following content.
{"myArray":[1,2,3,4]}
Here are the flowfile attributes
How can I get "myArray" as an array again in the content?
Use record oriented processors like Convert Record processor instead of using EvaluateJsonPath,AttributesToJSON processors.
RecordReader as JsonPathReader
JsonPathReader Configs:
AvroSchemaRegistry:
{
"namespace": "nifi",
"name": "person",
"type": "record",
"fields": [
{ "name": "myArray", "type": {
"type": "array",
"items": "int"
}}
]
}
JsonSetWriter:
Use the same AvroSchemaRegistry controller service to access the schema.
To access the AvroSchema you need to set up schema.name attribute to the flowfile.
Output flowfile content would be
[{"myArray":[1,2,3,4]}]
please refer to this link how to configure ConvertRecord processor
(or)
if your deserved output is {"myArray":[1,2,3,4]} without [](array) then use
ReplaceText processor instead of AttributesToJson Processor.
ReplaceText Configs:
Not all credit goes to me but I was pointed to a better simpler way to achieve this. There are 2 ways.
Solution 1 - and the simplest and elegant
Use Nifi JoltTransformJSON Processor. The processor can make use of Nifi expression language and attributes in both left or right hand side of the specification syntax. This allows you to quickly use the JOLT default spec to add new fields (from flow-file attributes) to a new or existing JSON.
Ex:
{"customer_id": 1234567, "vckey_list": ["test value"]}
both of those fields values are stored in flow-file attributes as a result of a EvaluateJSONPath operation. Assume "customer_id_attr" and ""vckey_list_attr". We can simply generate a new JSON from those flow-file attributes with the "default" jolt spec and the right hand syntax. You can even add addition expression language functions to the processing
[
{
"operation": "default",
"spec": {
"customer_id": ${customer_id_attr},
"vckey_list": ${vckey_list_attr:toLower()}
}
}
]
This worked for me even when storing the entire JSON, path of "$", in a flow-file attribute.
Solution 2 - complicated and uglier
Use a sequence Nifi ReplaceText Processor. First use a ReplaceText processor to append the desired flow-file attribute to the file-content.
replace_text_processor_1
If you are generating a totally new JSON, this would do it. If you are trying to modify an existing one, you would need to first append the desired keys, than use ReplaceText again to properly format as a new key in the existing JSON, from
{"original_json_key": original_json_obj}{"customer_id": 1234567, "vckey_list": ["test value"]}
to
{"original_json_key": original_json_obj, "customer_id": 1234567, "vckey_list": ["test value"]}
using
replace_text_processor_2
Then use JOLT to do further processing (that's why Sol 1 always makes sense)
Hope this helps, spent about half a day figuring out the 2nd Solution and was pointed to Solution 1 by someone with more experience in Nifi
Initially i am querying a table to get two values id,payload. The payload field is again a json but in a string. please check below payload string.
{
"schema": "http://schemas.viacom.com/what/is/the/path#",
"op": "delete",
"entity": "movie",
"entity_identifier": {
"series_code": 53709,
"episode_code": 1
},
"entity_vmid": "",
"short_name": "",
"title": ""
}
I want series_code and episode_code values. I Tried in below ways but of no use
ExecuteSQL --> ConvertAvrotoJSON --> EvaluateJSON($.payload.entity_identifier.series_code)
ExecuteSQL --> ConvertAvrotoJSON --> AttributestoJSON --> EvaluateJSON
Please help.
You can use the EvaluateJsonPath processor to evaluate JsonPath expressions against the content of the flowfile. You add one user-defined property per value you want to extract. Set the Destination value to flowfile-attribute to extract the value into an attribute which will be added to the flowfile, or flowfile-content to generate a new flowfile with the extracted value as the sole content.
Given the JSON you provided, the two path expressions you would use are:
$.entity_identifier.series_code -> 53709
$.entity_identifier.episode_code -> 1