NiFi - Persist timestamp value used in ExecuteSQLRecord processor query - apache-nifi

My use-case is simple but I did not find the right solution so-far.
I write the query which tag the data with the current timestamp in of the column at the time ExecuteSQLRecord processor hit and get the data from database now want I wanted is that created flowfile has to have the same timestamp in his name as well but i did not know how to capture the attribute which is ${now():format("yyyyMMddHHmmss")} so I can use alter for renaming the flowfile
Basically, I wanted to store the timestamp "at the time I hit the database", I can not use the update processor just before the executeSQL processor to get the timestamp needed (why => because if prior execution is still in process with executeSQL and all the flow files will pass updateattribute processor with the timestamp value and will sit in the queue until executeSQL processor process current thread).
Note - I am running NiFi in standalone mode so I can not run executeSQL in multiple threads.
Any help is highly appreciated. thanks in advance

ExecuteSQLRecord writes an attribute called executesql.query.duration which contains the duration of the query + fetch in milliseconds.
So, we can put an UpdateAttribute processor AFTER the ExecuteSQLRecord that uses ${now():toNumber():minus(${executesql.query.duration})} to get the current time as Epoch Millis, then minus the total query duration, to get the time at which the Query started.
You can then use :format('yyyyMMddHHmmss') to bring it back to the timestamp format you want.
It might be a few milliseconds off of the exact time (time taken to get to the UpdateAttribute processor).
See docs for ExecuteSQLRecord

Related

The ExecuteSQL processor doesn't work after connecting with other processor

When I didn't connect any processors as an incoming one, the ExecuteSQL works perfectly fine as the screenshot
Screenshot#1
But when I've connected with another processor, there's no flowfiles coming out of the ExecuteSQL processor.
Screenshot#2
Anyone know how could I make it works? Thank you in advance :-)
check the NiFi docs and you'll find this dscription
Executes provided SQL select query. Query result will be converted to Avro format. Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the select query, and the query may use the ? to escape parameters. In this case, the parameters to use must exist as FlowFile attributes with the naming convention sql.args.N.type and sql.args.N.value, where N is a positive integer. The sql.args.N.type is expected to be a number indicating the JDBC Type. The content of the FlowFile is expected to be in UTF-8 format. FlowFile attribute 'executesql.row.count' indicates how many rows were selected.
it tells you that you have to use some special atrributes by triggering via flowfile.
something like sql.args.1.type and sql.args.1.value

Scheduling NiFi Processor to run upon receiving the first flow file of the day

How can I schedule a NiFi processor to run only when it receives the first flow file of the day.
The processor can ignore all subsequent flowfiles.
you need any kind of storage to remember the previous date.
as variant use DistributedMapCache to store previous date
flow:
----------------------> FetchDistributedMapCache - get prev date
-(success, not found)-> RouteOnAttribute - compare previous date with current date
-(not matched)--------> PutDistributedMapCache - store new date
----------------------> next processor that triggered on date change

Nifi record counts

I am getting files from remote server using Nifi: my files are as follow:
timestamp (ms), nodeID,value
12345,x,12.4
12346,x,12.7
12348,x,13.4
12356,x,13,6
12355,y,12.0
I am now just get and fetch and split lines and send them to Kafka, but before hand, I need to apply a checksum approach on my records and aggregate them based on time stamp, what I need to do to add an additional column to my content and count the records based on aggregated time stamps, for example aggregation based on each 10 milliseconds and nodeID..
timestamp (ms), nodeID,value, counts
12345,x,12.4,3
12346,x,12.7,3
12348,x,13.4,3
12356,x,13,6,1
12355,y,12.0,1
How to do above process in NiFi. I am totally new to Nifi but need to add above functinality to my Nifi process. I am currently using below nifi process
This may not answer your question directly, but you should consider refactoring your flow to use the "record" processors. It would greatly simplify things and would probably get you closer to being able to do the aggregation.
The idea is to not split up the records, and instead process them in place. Given your current flow, the 4 processors after FetchSFTP would like change to a single ConvertRecord processor that converts CSV to JSON. You would first need to defined a simple Avro schema for your data.
Once you have the record processing setup, you might be able to use PartitionRecord to partition the records by the node id, and then from there the missing piece would be how to count by the timestamps.
Some additional resources...
https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi
https://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries
https://www.slideshare.net/BryanBende/apache-nifi-record-processing

Get Hbase processor filter row by timestamp

I'm trying to use HBase get processor in NIFI, and i want to do this command in the hbase processor is it possible ?
scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
The GetHBase processor is made to do incremental extraction from an HBase table based on the timestamp. The Initial Time Range property determines whether the min time starts at 0 or at the current time, after that the processor is keeping track of the max time seen in the previous execution and using that as the min time in the next execution. So you can't provide your own timerange since the processor is managing that for you.
The GetHBase processor always looks for incremental updates based on the timestamp. Basically it recognizes the new/updated data automatically.
But if you still want to read row specifically for timestamp(s), you have to use regular expression in the following format in the tab "Filter Expression":
TimeStampsFilter(timestamp1,timestamp2....timestampn)
You can find a list of these filters in: https://www.cloudera.com/documentation/enterprise/5-3-x/topics/admin_hbase_filtering.html

Getting execution time from QueryDatabaseTable in NiFi

I am using the process QueryDatabaseTable in NiFi for incrementally getting data from a DB2. QueryDatabaseTable is scheduled to run every 5 minutes. Maximum-value Columns is set to "rep" (which corresponds to a date, in the DB2 db).
I have a seperate MySQL database I want to update with the value "rep", that QueryDatabaseTable uses to query the DB2 database with. How can i get this value?
In the logfiles I've found that the attributes of the FlowFiles does not contain this value.
QueryDatabaseTable doesn't currently accept incoming flow files or allow the use of Expression Language to define the table name, I've written up an improvement Jira to handle this:
https://issues.apache.org/jira/browse/NIFI-2340

Resources