NiFi: Get all the processors name involved in a particular run - apache-nifi

I have a nifi template of 30 processors. There are multiple conditional branches are there in the template. Now, I want to add something at the end of template so that I can get the list of all processors name which has executed for a particular run.
How can do this?
Thanks,

You could technically insert an UpdateAttribute processor after every "operational" processor which would add an attribute containing the most recent processor, but #Bryan is correct that the provenance feature exists to provide this information automatically. If you need to operate on it, you can use the SiteToSiteProvenanceReportingTask to send that data to a Remote Process Group (linked to an Input Port on the same instance) and then treat that data as any other in NiFi and examine/transform it.

Related

Find Provenance Data For Flowfile Within a Processor

I am attempting to develop a NiFi processor that would extend the functionality of the built-in processor "Monitor Activity".
The problem I am attempting to solve is that in my application, I would have multiple flows entering the processor, with the processor alerting by email when no flowfiles arrive within a certain time period. However, if only one of the flows stop, no alert will be triggered.
I would like to modify the processor such that it would be able to distinguish between the different flows and alert accordingly.
In order to do this, I would need a way to deferentiate between flowfiles originating from one processor and another.
I am aware NiFi keeps detailed provenance records that can be easily accessed from within the GUI interface but I'm unable to find an easy way of accessing this information programmatically from within processor code.

NiFi How to get the current processor Name and Processor group name through the custom processor using (Java)

I'm Creating the NiFi Custom processor using Java,
one of the requirement is to get the previous processor name and processor group (like a breadcrumb) using java code.
The previous processor name and process group name is not immediately (nor meant to be) available to processors, can you explain more about your use case? You can perhaps use a SiteToSiteProvenanceReportingTask to send provenance information back to your own NiFi instance (an Input Port, e.g.) and find the events that correspond to FlowFiles entering your custom processor, the events should have the source (previous) processor and destination (your custom) processor.
If instead you code your custom processor using InvokeScriptedProcessor with Groovy for example, then you can "bend the rules" and get at the previous processor name and such, as Groovy allows access to private members and you can assume the implementation of the ProcessContext in onTrigger is an instance of StandardProcessContext, so you can get at its members which include upstream connections and thus the previous processor. For a particular FlowFile though, I'm not sure you can use this approach to know which upstream processor it came from.
Alternatively, you could add an UpdateAttribute after each "previous processor" to set attribute(s) with the information about that processor, but that has to be hardcoded and applied to every corresponding part of the flow.
I faced this some time back. I used InvokeHTTP processor and used nifi-api/process-groups/${process_group_id} Web Service
This is how I implemented:
Identify the process group where the error handling should be done. [Action Group]
Create a new process group [Error Handling Group] next to the Action Group and add relationship to transfer files to Error Handling Group.
Use the InvokeHTTP processor and set HTTP Method to GET
Set Remote URL to http://{nifi-instance}:{port}/nifi-api/process-groups/${action_group_process_group_id}
You will get response in JSON which you will have to customize according to your needs
Please let me know if you need the XML file that I am using. I can share that. It just works fine for me

nifi-api: List all processors with their configuration

I want to list all my ListenHTTP processor URLs so I can select and kick off different flows.
Is it possible with Nifi API query to list all processors with their configuration (in my case looking to get 'Base Path' and 'Listening Port') ?
Looking for a query that will return this info only (not the full processor details).
I can get an individual processor by name.
https://<IP-ADDRESS>:9443/nifi-api/flow/search-results?q=MyProcessor
Then parse out the processor's id from this result.
And with id get the processor's full details.
https://<IP-ADDRESS>:9443/nifi-api/processors/<PROCESSOR-ID>
But then I would have to parse out the config properties (and would have to repeat for each processor).
This seems a roundabout way of solving the problem.
Any help would be much appreciated.
Thanks
**** EDIT:
Best solution I can see at the moment is still a 2 step approach.
Get everything for ListenHTTP
https://<IP-ADDRESS>:9443/nifi-api/flow/search-results?q=ListenHTTP
This will return multiple Json arrays, where we want the 'processorResults'
Parse this (in Java) to get processor name and id.
Then (as above) get processor by 'id' and parse out config.
https://<IP-ADDRESS>:9443/nifi-api/processors/<PROCESSOR-ID>
You can use Python and NiPyAPI to recurse through the flow and get all the processors, then you'd filter on ListenHttp processors. You can also use NiPyAPI to kick off the desired flows, it is a very handy tool.

Nifi processor to route flows based on changeable list of regex

I am trying to use Nifi to act as a router for syslog based on a list of regexes matching the syslog.body (nb as this is just a proof of concept I can change any part if needed)
The thought process is that via a separate system (for now, vi and a text file 😃) an admin can define a list of criteria (regex format for each seems sensible) which, if matched, would result in syslog messages being sent to a specific separate system (for example, all critical audit data (matched by the regex list) is sent to the audit system and all other data goes to the standard log store
I know that this can be done on Route by content processors but the properties are configured before the processor starts and an admin would have to stop the processor every time they need to make an edit
I would like to load the list of regex in periodically (automatically) and have the processor properties be updated
I don’t mind if this is done all natively in Nifi (but that is preferable for elegance and to save an external app being written) or via a REST API call driven by a python script or something (or can Nifi send REST calls to itself?!)
I appreciate a processor property cannot be updated while running, so it would have to be stopped to be updated, but that’s fine as the queue will buffer for the brief period. Maybe a check to see if the file has changed could avoid outages for no reason rather than periodic update regardless, I can solve that problem later.
Thanks
Chris
I think the easiest solution would be to use ScanContent, a processor which specifies a dictionary file on disk which contains a list of search terms and monitors the file for changes, reloading in that event. The processor then applies the search terms to the content of incoming flowfiles and allows you to route them based on matches. While this processor doesn't support regular expressions as dictionary terms, you could make a slight modification to the code or use this as a baseline for a custom processor with those changes.
If that doesn't work for you, there are a number of LookupService implementations which show how CSV, XML, property files, etc. can be monitored and read by the controller framework to provide an updated mapping of key/value pairs. These can also serve as a foundation for building a more complicated scan/match flow using the loaded terms/patterns.
Finally, if you have to rely on direct processor property updating, you can script this with the NiFi API calls to stop, update, and restart the processors so it can be done in near-real-time. To determine these APIs, visit the API documentation or execute the desired tasks via the UI in your browser and use the Developer Tools to capture the HTTP requests being made.

NiFi fetchFile processor doesn't allow dynamic attributes

What is the reason that some of NiFi processors don't allow dynamic attributes? I'm using FetchFile processor in one of my workflows and I need to pass through some data throughout the flow to be able to use it in the last step. However, FetchFile breaks it by not allowing dynamic attributes. I'm wondering if there is another way to do it? Why would NiFi not allow dynamic attributes on certain processors?
My flow is something like
ExecuteScript -> EvaluateJSon -> Custom Processor to write files-> FetchFile->SendtoS3 -> Mark workflow complete
I want to send some metadata so that I could mark the workflow complete. I'm passing that data as attributes but it breaks at FetchFile.
There are two separate concepts, user-defined properties on processors, and flow file attributes.
User-defined properties let a processor take input from a user for something that couldn't be defined ahead of time. Examples of this are in EvaluateJsonPath when the JSON paths are specified in user-defined properties, or in PutSolrContentStream when all the user-defined properties get passed as query parameters to Solr.
FlowFile attributes are a map of key/value pairs that get passed around with each piece of data. These attributes are usually created when a processor produces or modifies a flow file, or can be manipulated using processors like UpdateAttribute.
It is up to each processor to decide whether it needs user-defined properties and how they would be used. UpdateAttribute happens to be a processor where the user-defined properties are added as new key/value pairs to each flow file, but it doesn't make sense for every processor to do that.

Resources