The ExecuteSQL processor doesn't work after connecting with other processor - apache-nifi

When I didn't connect any processors as an incoming one, the ExecuteSQL works perfectly fine as the screenshot
Screenshot#1
But when I've connected with another processor, there's no flowfiles coming out of the ExecuteSQL processor.
Screenshot#2
Anyone know how could I make it works? Thank you in advance :-)

check the NiFi docs and you'll find this dscription
Executes provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected.
it tells you that you have to use some special atrributes by triggering via flowfile.
something like sql.args.1.type and sql.args.1.value

Related

NiFi How can I merge exactly 3 records in a single one?

I'm working with NiFi, receiving several json in the same file. Pretending to modify propertly those jsons, I split it into several single flowfiles and here is where the problems start.
For each id and datetime, there are 3 flowfiles that I would like to merge into a single flowfile. I've tried to use MergeRecord NiFi's Processor, but it works when it wants. When I have a few records, it seems that it works ok. But when I have "a lot of" records, e.g., more than 70 records, it breaks. Sometimes it merges 2 records in a single record, sometimes it lets pass a single record directly.
merge_key is a string attribute based on id and datetime.
I need to take exactly 3 records and merge them in a single one.
If there is a way to order the flowfile and take the first n elements of it each 5 seconds, I think it can help. But I prefer to be sure it works properly without any "help"...
For ordering the flowfile we can use the EnforceOrder processor as per the default documentation.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.EnforceOrder/#:~:text=Order%20Attribute,will%20be%20routed%20to%20failure.
Also NiFi uses below prioritizers among the incoming flow files.
FirstInFirstOutPrioritizer
NewestFlowFileFirstPrioritizer
OldestFlowFileFirstPrioritizer
PriorityAttributePrioritizer
Refer below link for more details
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html

Record Oriented InvokeHTTP Processor

I have a csv file
longtitude,lagtitude
34.094933,-118.30674
34.095028,-118.306625
(more to go)
I use UpdateRecord Processor (which support record processing) with CSVRecordSetWriter using RecordPath (https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html) to prepare gis field.
longtitude,lagtitude,gis
34.094933,-118.30674,"34.094933,-118.30674"
34.095028,-118.306625,"34.095028,-118.306625"
My next step is to retrieve gis as input parameter to a HTTP API, where this HTTP API returns info (poi) that I would like to store.
longtitude,lagtitude,gis,poi
34.094933,-118.30674,"34.094933,-118.30674","Restaurant A"
34.095028,-118.306625,"34.095028,-118.306625","Cinema X"
It seems like InvokeHTTP Processor does not process in record oriented way. Any possible solution to prepare the above without split it further?
When you want to enrich each record like this it is typically handled in NiFi by using the LookupRecord processor with a LookupService. It is basically saying, for each record in the incoming flow file, pass in some fields of the record to the lookup service, and take the results of the lookup and stored them back in the record.
For your example it sounds like you would want a RestLookupService:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-lookup-services-nar/1.9.1/org.apache.nifi.lookup.RestLookupService/index.html

How can I filter flow files upon the result of an SQL query?

Would it be possible to route flow files according to the result of an SQL query which returns a single row result? For example, if the result is '1' the flow file will be processed; otherwise, it will be ignored.
Solution
The following approach worked best for me.
Use ExecuteSQL processor in order to run filtering SQL query. The query was written to produce either a single record (match) or an empty record set (no match) in a way suggested by Shu.
Connect ExecuteSQL to RouteOnAttribute processor in order to filter out unmatched flow files using the following value of routing property value ${executesql.row.count:replaceNull(0):gt(0)}
Notice, that the original content of a flow file will be lost after applying ExecuteSQL. It's not an issue in my case, because I do filtering before processing flow file content and my SQL query is based entirely on the flow file attributes and not on its content. Though in a more general scenario, when the flow file content is modified by the incoming part of the flow, one should save file content somewhere (e.g. file system) and restore it after the filtering part has applied.
You can add where clause in your sql query where <field_name> = 1 then we are only going to have output a flowfile when the result value =1.
(or)
Checking the data in NiFi:
We are going to have AVRO format data as the result of SQL query so you can use
option1:ConvertAvroToJson Processor:
Convert the AVRO data into JSON format then extract the value from the json content as attribute using EvaluateJsonPath processor.
Then use RouteOnAttribute processor add new property using NiFi expression language equals function compare the value and route the flowfile to matched relation.
Refer to this link more details regards to EvaluateJsonpath and RouteOnAttribute processor configs.
option2: Using QueryRecord processor:
By using QueryRecord processor we can run SQL queries on the content of the flowfile
Add new property to the processor as
select * from FLOWFILE where <filed_name> =1
Feed the property relation to the other processor
Refer to this link for more details regarding QueryRecord processor usage.

Nifi record counts

I am getting files from remote server using Nifi: my files are as follow:
timestamp (ms), nodeID,value
12345,x,12.4
12346,x,12.7
12348,x,13.4
12356,x,13,6
12355,y,12.0
I am now just get and fetch and split lines and send them to Kafka, but before hand, I need to apply a checksum approach on my records and aggregate them based on time stamp, what I need to do to add an additional column to my content and count the records based on aggregated time stamps, for example aggregation based on each 10 milliseconds and nodeID..
timestamp (ms), nodeID,value, counts
12345,x,12.4,3
12346,x,12.7,3
12348,x,13.4,3
12356,x,13,6,1
12355,y,12.0,1
How to do above process in NiFi. I am totally new to Nifi but need to add above functinality to my Nifi process. I am currently using below nifi process
This may not answer your question directly, but you should consider refactoring your flow to use the "record" processors. It would greatly simplify things and would probably get you closer to being able to do the aggregation.
The idea is to not split up the records, and instead process them in place. Given your current flow, the 4 processors after FetchSFTP would like change to a single ConvertRecord processor that converts CSV to JSON. You would first need to defined a simple Avro schema for your data.
Once you have the record processing setup, you might be able to use PartitionRecord to partition the records by the node id, and then from there the missing piece would be how to count by the timestamps.
Some additional resources...
https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries
https://www.slideshare.net/BryanBende/apache-nifi-record-processing

Best approach to determine Oracle INSERT or UPDATE using NiFi

I have a JSON flow-file and I need determine if I should be doing an INSERT or UPDATE. The trick is to only update the columns that match the JSON attributes. I have an ExecuteSQL working and it returns executesql.row.count, however I've lose the original JSON flow-file which I was planing to use as a routeonattribute. I'm trying to get the MergeContent to join the ExecuteSQL (dump the Avro output, I only need the executesql.row.count attribute) with the JSON flow. I've set follow before I do the ExecuteSQL:
fragment.count=2
fragment.identifier=${UUID()}
fragment.index=${nextInt()}
Alternatively I could create a MERGE, if there is a way to loop through the list of JSON attributes that match the Oracle table?
How large is your JSON? If it's small, you might consider using ExtractText (matching the whole document) to get the JSON into an attribute. Then you can run ExecuteSQL, then ReplaceText to put the JSON back into the content (overwriting the Avro results). If your JSON is large, you could set up a DistributedMapCacheServer and (in a separate flow) run ExecuteSQL and store the value or executesql.row.count into the cache. Then in the JSON flow you can use FetchDistributedMapCache with the "Put Cache Value In Attribute" property set.
If you only need the JSON to use RouteOnAttribute, perhaps you could use EvaluateJsonPath before ExecuteSQL, so your conditions are already in attributes and you can replace the flow file contents.
If you want to use MergeContent, you can set fragment.count to 2, but rather than using the UUID() function, you could set "parent.identifier" to "${uuid}" using UpdateAttribute, then DuplicateFlowFile to create 2 copies, then UpdateAttribute to set "fragment.identifier" to "${parent.identifier}" and "fragment.index" to "${nextInt():mod(2)}". This gives a mergeable set of two flow files, you can route on fragment.index being 0 or 1, sending one to ExecuteSQL and one through the other flow, joining back up at MergeContent.
Another alternative is to use ConvertJSONToSQL set to "UPDATE", and if it fails, route those flow files to another ConvertJSONToSQL processor set to "INSERT".

Resources