Apache Nifi - how to write Json to a database table column? - apache-nifi

I have an json array that I need to write to a database (as text).
I have two options:
Write as an array of objects, so the field would contain [{},{},{}]
Write each record as an object, so the field would contain {}
The problem is that nifi does not know how to map the json object to a specific database field on PutDatabaseRecord.
How to I map it?
Here is my flow:

You should use a combination of
ConvertAvroToJSON >> SplitJson(if you have multiple) >> ConvertJsontoSQL >> PutSQL
In the convertJsonToSQL you will have to set the db,schema,table for the incoming json payload to map to.
The config options are self explainatory for the convertJsonToSQL processor

Related

Nifi denormalize json message and storing in Hive

I am consuming a nested Json Kafka message and after some transformations, storing it into hive.
Requirement is json contains several nested arrays and we have to denormalize it so that each element in array forms a separate row in hive table. Would JoltTransform or SplitJson work or do I need to write groovy script for the same?
Sample input -
{"TxnMessage": {"HeaderData": {"EventCatg": "F"},"PostInrlTxn": {"Key": {"Acnt": "1234567890","Date": "20181018"},"Id": "3456","AdDa": {"Area": [{"HgmId": "","HntAm": 0},{"HgmId": "","HntAm": 0}]},"escTx": "Reload","seTb": {"seEnt": [{"seId": "CAKE","rCd": 678},{"seId": "","rCd": 0}]},"Bal": 6766}}}
Expected Output - {"TxnMessage.PostInrlTxn.AdDa.Area.HgmId":"","TxnMessage.PostInrlTxn.AdDa.Area.HntAm":0,"TxnMessage.HeaderData.EventCatg":"F","TxnMessage.PostInrlTxn.Key.Acnt":"1234567890","TxnMessage.PostInrlTxn.Key.Date":"20181018","TxnMessage.PostInrlTxn.Id":"3456","TxnMessage.PostInrlTxn.escTx":"Reload","TxnMessage.PostInrlTxn.Bal":6766}
{"TxnMessage.PostInrlTxn.AdDa.Area.HgmId":"","TxnMessage.PostInrlTxn.AdDa.Area.HntAm":0,"TxnMessage.HeaderData.EventCatg":"F","TxnMessage.PostInrlTxn.Key.Acnt":"1234567890","TxnMessage.PostInrlTxn.Key.Date":"20181018","TxnMessage.PostInrlTxn.Id":"3456","TxnMessage.PostInrlTxn.escTx":"Reload","TxnMessage.PostInrlTxn.Bal":6766}
{"TxnMessage.PostInrlTxn.seTb.seEnt.seId":"CAKE","TxnMessage.PostInrlTxn.seTb.seEnt.rCd":678,"TxnMessage.HeaderData.EventCatg":"F","TxnMessage.PostInrlTxn.Key.Acnt":"1234567890","TxnMessage.PostInrlTxn.Key.Date":"20181018","TxnMessage.PostInrlTxn.Id":"3456","TxnMessage.PostInrlTxn.escTx":"Reload","TxnMessage.PostInrlTxn.Bal":6766}
{"TxnMessage.PostInrlTxn.seTb.seEnt.seId":"","TxnMessage.PostInrlTxn.seTb.seEnt.rCd":0,"TxnMessage.HeaderData.EventCatg":"F","TxnMessage.PostInrlTxn.Key.Acnt":"1234567890","TxnMessage.PostInrlTxn.Key.Date":"20181018","TxnMessage.PostInrlTxn.Id":"3456","TxnMessage.PostInrlTxn.escTx":"Reload","TxnMessage.PostInrlTxn.Bal":6766}

Apache NiFi - How to add/pass attributes to a Processor, not a flow file

My Purpose
Execute a sql and write result(flow file) using my own schema to a file directly.
Please see the explanation blow.
Solution 1 (use 4 processors)
ExecuteSql and the records has auto-generated(embedded) avro schema.
ConvertRecord: The Record Reader just use embedded avro schema and the Record Writer use my own schema from HortonworkSchemaRegistry, so pass attributes - 'schema.name' and 'schema.version' - by using UpdateAttribute.
It works.
Solution 2 (use ExecuteSqlRecord)
It may like this:
ExecuteSqlRecord has Record Writer
And the Record Writer get avro schema from HortonworkSchemaRegistry using 'schema.name' and 'schema.version' attributes
But ExecuteSqlRecord not support user-define-attributes
So
Is it the way to use ExecuteSqlRecord processor?
How to add attributes to a processor?
As for now, Users cannot add new properties to ExecuteSQL* processors.
Below are the ways you can try
Using GenerateFlowFile processor
Add schema.name attribute with some value.
Flow:
1.GenerateFlowFile //add schema.name attribute with value.
2.ExecuteSQLRecord
2.PutFile
(or)
By hard code schema.name value in RecordWriter controller service. in this case you don't need GenerateFlowFile processor.
Flow:
1.ExecuteSQLRecord //hardcode schema.name property value
2.PutFile

How can I filter flow files upon the result of an SQL query?

Would it be possible to route flow files according to the result of an SQL query which returns a single row result? For example, if the result is '1' the flow file will be processed; otherwise, it will be ignored.
Solution
The following approach worked best for me.
Use ExecuteSQL processor in order to run filtering SQL query. The query was written to produce either a single record (match) or an empty record set (no match) in a way suggested by Shu.
Connect ExecuteSQL to RouteOnAttribute processor in order to filter out unmatched flow files using the following value of routing property value ${executesql.row.count:replaceNull(0):gt(0)}
Notice, that the original content of a flow file will be lost after applying ExecuteSQL. It's not an issue in my case, because I do filtering before processing flow file content and my SQL query is based entirely on the flow file attributes and not on its content. Though in a more general scenario, when the flow file content is modified by the incoming part of the flow, one should save file content somewhere (e.g. file system) and restore it after the filtering part has applied.
You can add where clause in your sql query where <field_name> = 1 then we are only going to have output a flowfile when the result value =1.
(or)
Checking the data in NiFi:
We are going to have AVRO format data as the result of SQL query so you can use
option1:ConvertAvroToJson Processor:
Convert the AVRO data into JSON format then extract the value from the json content as attribute using EvaluateJsonPath processor.
Then use RouteOnAttribute processor add new property using NiFi expression language equals function compare the value and route the flowfile to matched relation.
Refer to this link more details regards to EvaluateJsonpath and RouteOnAttribute processor configs.
option2: Using QueryRecord processor:
By using QueryRecord processor we can run SQL queries on the content of the flowfile
Add new property to the processor as
select * from FLOWFILE where <filed_name> =1
Feed the property relation to the other processor
Refer to this link for more details regarding QueryRecord processor usage.

Best approach to determine Oracle INSERT or UPDATE using NiFi

I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?
How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".

Apache Nifi - get the file attributes and construct the json through custom processor

I am using a custom processor for csv to json conversion which converts the csv file data into a json array which contains json objects of the data.
My requirement is to get the file attributes like filename, uuid, path etc. and construct a json from these.
Question:
How can I get the related attributes of the file and construct the a json object appending it to the same json getting constructed before.
Just been few days working with apache nifi, so just going with the exact requirements now with the custom processor.
I can't speak to which attributes are being written for your custom processor, but there is a set of core attributes that most/all flow files have, such as filename and uuid. If you are using GetFile or ListFile/FetchFile to read in your CSV file, you will have those and a number of other attributes available (see the doc for more info).
When you have a flow file that has the appropriate attributes set, you can use the AttributesToJSON processor to create a JSON object containing a flat list of the specified attributes, and that object can replace the flow file content or become its own attribute (named 'JSONAttributes') depending on the setting of the "Destination" property of AttributesToJSON.

Resources