Nifi add attribute from DB - apache-nifi

I am currently getting files from FTP in Nifi, but I have to check some conditions before I fetch the file. The scenario goes some thing like this.
List FTP -> Check Condition -> Fetch FTP
In the Check Condition part, I have fetch some values from DB and compare with the file name. So can I use update attribute to fetch some records from DB and make it like this?
List FTP -> Update Attribute (from DB) -> Route on Attribute -> Fetch FTP

I think your flow looks something like below
Flow:
1.ListFTP //to list the files
2.ExecuteSQL //to execute query in db(sample query:select max(timestamp) db_time from table)
3.ConvertAvroToJson //convert the result of executesql to json format
4.EvaluateJsonPath //keep destination as FlowfileAttribute and add new property as db_time as $.db_time
5.ROuteOnAttribute //perform check filename timestamp vs extracted timestamp by using nifi expresson language
6.FetchFile //if condition is true then fetch the file
RouteOnAttribute Configs:
I have assumed filename is something like fn_2017-08-2012:09:10 and executesql has returned 2017-08-2012:08:10
Expression:
${filename:substringAfter('_'):toDate("yyyy-MM-ddHH:mm:ss"):toNumber()
:gt(${db_time:toDate("yyyy-MM-ddHH:mm:ss"):toNumber()})}
By using above expression we are having filename value same as ListFTP filename and db_time attribute is added by using EvaluateJsonPath processor and we are changing the time stamp to number then comparing.
Refer to this link for more details regards to NiFi expression language.

So if I understand your use case correctly, it is like you are using the external DB only for tracking purpose. So I guess only the latest processed timestamp is enough. In that case, I would suggest you to use DistributedCache processors and ControllerServices offered by NiFi instead of relying on an external DB.
With this method, your flow would be like:
ListFile --> FetchDistributedMapCache --(success)--> RouteOnAttribute -> FetchFile
Configure FetchDistributedMapCache
Cache Entry Identifier - This is the key for your Cache. Set it to something like lastProcessedTime
Put Cache Value In Attribute - Whatever name you give here will be added as a FlowFile attribute with its value being the Cache value. Provide a name, like latestTimestamp or lastProcessedTime
Configure RouteOnAttribute
Create a new dynamic relationship by clicking the (+) button in the Properties tab. Give it a name, like success or matches. Let's assume, your filenames are of the format somefile_1534824139 i.e. it has a name and an _ and the epoch timestamp appended.
In such case, you can leverage NiFi Expression Language and make use of the functions it offer. So for the new dynamic relation, you can have an expression like:
success - ${filename:substringAfter('_'):gt(${lastProcessedTimestamp})}
This is with the assumption that, in FetchDistributedMapCache, you have configured the property Put Cache Value In Attribute with the value lastProcessedTimestamp.
Useful Links
https://community.hortonworks.com/questions/83118/how-to-put-data-in-putdistributedmapcache.html
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#dates

Related

NiFi - Call Rest API for every row in the file

I have a datset of IDs, I've got a flow file that has one row per ID. I have an API that takes this ID as a parameter, and I want to harvest the results for all rows back into NiFi (example below).
https://service.com/api/thing/{ID}
How in NiFi, can I call this API, for all IDs in my dataset. Ideally using some parallelism if possible.
(for reference, in SSIS I could load these IDs into an array and then loop over an API call with a parameter for the ID).
First, use SplitText to get each Id as a flowfile
Then copy content to an attribute by ExtractText . add custom property such as 'message.body' in this example
so that ExtractText would add message.body.0 attribute to the flowfile and you can use it InvokeHttp like below . Please note that since your endpoint is https , you may need to configure SSL Contect Service
Finally , you can set concurent task count for each Processor for parallelism

trying to add a field based on conditionals with NifI

I'm new to Apache NiFi and currently using it to route message data to various locations. I'm looking to add some fields based on a set of conditionals.
Currently I have a GetFile Processor that reads log files ---> ExtractGrok that applies a grok pattern to parse ---> ConvertRecord to convert from Grok to Json. The next part is where I'm stumped/not sure what to do next.
In my json I have a field refresh_time I need to create 2 new fields based on some conditions about the field refresh_time
something along the lines of if refresh_time < 10 then cache = 1; else if refresh_time > 10 then reprocess = 1
The end goal here is numeric fields cache and refresh_time that can be used down the road in aggregations.
What would be the best way to add 2 numerical fields based on a condition. Is there a processor for adding additional fields or updating the record to include new fields?
Thanks.
There's a couple ways you could achieve what you want to.
One option (More readable)
A QueryRecord would let you write a SQL statement across your Records and let you split them by the result. E.g.
Add a dynamic property called cache with a value SELECT * FROM FLOWFILE WHERE refresh_time < 10.
Add a dynamic property called refresh with a value SELECT * FROM FLOWFILE WHERE refresh_time > 10.
The QueryRecord will now have the relationships failure, original, cache and refresh.
Branching off from cache and refresh will be one UpdateRecord each, with Replacement Value Strategy set to Literal Value.
For the cache relationship, you can add a new dynamic property called cache with a value 1. For the refresh relationship, you can add a new dynamic property called refresh with a value 1.
Similar option (Possibly more performant)
If you want to avoid the additional UpdateRecord, you can add fields in the QueryRecord with something like this:
Two dynamic properties set as:
cache = SELECT *, 1 AS cache FROM FLOWFILE WHERE REFRESH < 10
reprocess = SELECT *, 1 AS reprocess FROM FLOWFILE WHERE REFRESH > 10
This option may be more performant due to fewer disk reads.
This gist is an example of the second option, you can import it to NiFi to try it out.
Also, FYI there is a GrokReader that you could use in ConvertRecord to parse with Grok straight to JSON, potentially skipping the ExtractGrok.

How to set start and end row or interval rows for CSV in Nifi?

I want to get particular part of excel file in Nifi. My Nifi template like that;
GetFileProcessor
ConvertExcelToCSVProcessor
PutDatabaseRecordProcessor
I should parse data between step 2 and 3.
Is there a solution for getting specific rows and columns ?
Note:If there is a option for cutting ConvertExcelToCSVProcessor, it will work for me.
You can use Record processors between ConvertExcelToCSV and PutDatabaseRecord.
to remove or override a column use UpdateRecord. this processor can receive your data via CSVReader and prepare an output for PutDatabaseRecord or QueryRecord . check View usage -> Additional Details...
in order to filter by column use QueryRecord.
here an example. this example receives data through CSVReader and makes some aggregations, you can as well do some filtering according to doc
also this post had helped me to understand Records in Nifi

Apache Nifi, can I collect an attribute from multiple flow files

I have a nifi flow that takes in .csv files and partitions each into multiple records with each csv column value added as an attribute.
At one point in the flow, I'd like to collect the value of one attribute from each record that passes though. There could be from 0 to n collected. Once I have the list, it'll be emailed out.
I'm trying to avoid me (or someone else) getting bombed with emails if there are 200+ bad records in a file. So if I could collect for a fixed period of time or until another attribute (filename) changes, that would be great.
I've tried merge content and record. I even tried replace text to replace the content w/ just the attribute value I want to save and merging those, and a slew of other things.
Is there a simple way to do this in nifi?
Have you tried UpdateAttribute with a new attribute of type array. When each flowfile passes the this processor you could continue to update the value of this attribute by appending a new value to the array, attribute.
However, as #daggett pointed out, it will be helpful if you can provide the input and expected output.

Passing parameter from different source into insert statement using Nifi

I'm still new in NiFi. What I want to achieve is to pass a parameter from a different source.
Scenario:
I have 2 datasource which is Json data and record id (from oracle function). I declared record id using extract text as "${recid}" and json string default is "$1" .
How to insert into table using sql statement insert into table1 (json,recid) value ('$1','${recid}')
After I run the processor. I'm not able to get both attribute into one insert statement.
Please help.
Nifi flowfile
Flowfile after mergecontent
you should merge these 2 flowfiles to make one.
Use mergeFlowfile processor with Attribute Strategy set to Keep All Unique Attributes
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeContent/index.html
Take a look at LookupAttribute with a SimpleDatabaseLookupService. You can pass your JSON flow file into that, look up the recid into an attribute, then do the ExtractText -> ReplaceText to get it into SQL form.

Resources