Best approach to determine Oracle INSERT or UPDATE using NiFi - apache-nifi

I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?

How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".

Related

Apache NiFi how to check output from each processor

I am new to using Apache NiFi, and I trying to create a template that takes a JSON file and turns it into a set of SQL insert statements.
So far I have created a template that takes the JSON file and I have got it to the point of PutSQL. There is no database to connect to at the moment, but what I have not been able to check is the output. Can this be done? What I need to check is whether the array of JSON has been turned into a INSERT per element in the array
As far as inpecting the output, what does your flow look like? If you have something like ConvertJSONToSQL -> PutSQL, you can leave PutSQL stopped and run ConvertJSONToSQL, then you will see FlowFile(s) in the connection between the two processors. Then you can right-click on the connection and choose List Queue, then click the "eye" icon on the right for the FlowFile you wish to inspect. That will show you the contents of the FlowFile right before it goes into PutSQL.
Having said all that, if your JSON file contains fields that correspond to columns in your database, consider PutDatabaseRecord instead of ConvertJSONToSQL -> PutSQL. That can use a JsonTreeReader to parse each record, and it will generate and execute the necessary SQL as a prepared statement using the values in all records of the FlowFile. That way you don't need to generate the SQL yourself or worry about fragmented transactions or any of that.

Apache Nifi - how to write Json to a database table column?

I have an json array that I need to write to a database (as text).
I have two options:
Write as an array of objects, so the field would contain [{},{},{}]
Write each record as an object, so the field would contain {}
The problem is that nifi does not know how to map the json object to a specific database field on PutDatabaseRecord.
How to I map it?
Here is my flow:
You should use a combination of
ConvertAvroToJSON >> SplitJson(if you have multiple) >> ConvertJsontoSQL >> PutSQL
In the convertJsonToSQL you will have to set the db,schema,table for the incoming json payload to map to.
The config options are self explainatory for the convertJsonToSQL processor

NiFi make single Database call after processing multiple flowfiles

I have a below scenario, I am trying
get new files list with ListFile processor
set a constant variable zipFilesBundleConstant = listBundle on each flowfile
Put the list to Database
Get all the list of files old and new from Database to process further with ExecuteSQL processor. (Here I want to make only one Database call to fetch complete list old and new, but ExecuteSQL is being called for all the flowfiles)
I tried keeping MergeContent processor with zipFilesBundleConstant as Correlation Attribute Name before ExecuteSQL to combine all the flowfiles but that is not working as expected and it merges some but always gives me multiple flowfiles.
Can anyone please help me with a solution on how to make a one call after inserting the new files list into the database.
You can use ExecuteSQL processor has separate workflow to fetch existing old files list from the database with the scheduling strategy as per the requirement.
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#scheduling-tab

Apache NiFi EvaluateXQuery

I am trying to use NiFi to break up an XML document into multiple flowfiles. The XML contains many elements from a web service. I am trying to process each event separately. I think EvaluateXQuery is the appropriate processor but I can't figure out to add my XQuery if the destination is a flowfile rather than an attribute. I know I have to add a property /value pair in the processor config/properties page but I can't figure out what the property name should be. Does it matter?
If you only need to extract one element, then yes, add a dynamic property with any name and set the destination to flowfile-content.
You can add multiple dynamic properties to the processor to extract elements into attributes on the outgoing flowfile. If you want to then replace the flowfile content with the attributes, you can use a processor like ReplaceText or AttributesToJson to combine multiple attributes into the flowfile content.
A couple things to remember:
extracting multiple large elements to attributes is an anti-pattern, as this will hurt performance on the heap
you might be better off splitting the XML file into chunks via SplitXML first in order to then extract a single element per chunk into the flowfile content (or an attribute)

How can I filter flow files upon the result of an SQL query?

Would it be possible to route flow files according to the result of an SQL query which returns a single row result? For example, if the result is '1' the flow file will be processed; otherwise, it will be ignored.
Solution
The following approach worked best for me.
Use ExecuteSQL processor in order to run filtering SQL query. The query was written to produce either a single record (match) or an empty record set (no match) in a way suggested by Shu.
Connect ExecuteSQL to RouteOnAttribute processor in order to filter out unmatched flow files using the following value of routing property value ${executesql.row.count:replaceNull(0):gt(0)}
Notice, that the original content of a flow file will be lost after applying ExecuteSQL. It's not an issue in my case, because I do filtering before processing flow file content and my SQL query is based entirely on the flow file attributes and not on its content. Though in a more general scenario, when the flow file content is modified by the incoming part of the flow, one should save file content somewhere (e.g. file system) and restore it after the filtering part has applied.
You can add where clause in your sql query where <field_name> = 1 then we are only going to have output a flowfile when the result value =1.
(or)
Checking the data in NiFi:
We are going to have AVRO format data as the result of SQL query so you can use
option1:ConvertAvroToJson Processor:
Convert the AVRO data into JSON format then extract the value from the json content as attribute using EvaluateJsonPath processor.
Then use RouteOnAttribute processor add new property using NiFi expression language equals function compare the value and route the flowfile to matched relation.
Refer to this link more details regards to EvaluateJsonpath and RouteOnAttribute processor configs.
option2: Using QueryRecord processor:
By using QueryRecord processor we can run SQL queries on the content of the flowfile
Add new property to the processor as
select * from FLOWFILE where <filed_name> =1
Feed the property relation to the other processor
Refer to this link for more details regarding QueryRecord processor usage.

Resources