nifi-api: List all processors with their configuration - apache-nifi

I want to list all my ListenHTTP processor URLs so I can select and kick off different flows.
Is it possible with Nifi API query to list all processors with their configuration (in my case looking to get 'Base Path' and 'Listening Port') ?
Looking for a query that will return this info only (not the full processor details).
I can get an individual processor by name.
https://<IP-ADDRESS>:9443/nifi-api/flow/search-results?q=MyProcessor
Then parse out the processor's id from this result.
And with id get the processor's full details.
https://<IP-ADDRESS>:9443/nifi-api/processors/<PROCESSOR-ID>
But then I would have to parse out the config properties (and would have to repeat for each processor).
This seems a roundabout way of solving the problem.
Any help would be much appreciated.
Thanks
**** EDIT:
Best solution I can see at the moment is still a 2 step approach.
Get everything for ListenHTTP
https://<IP-ADDRESS>:9443/nifi-api/flow/search-results?q=ListenHTTP
This will return multiple Json arrays, where we want the 'processorResults'
Parse this (in Java) to get processor name and id.
Then (as above) get processor by 'id' and parse out config.
https://<IP-ADDRESS>:9443/nifi-api/processors/<PROCESSOR-ID>

You can use Python and NiPyAPI to recurse through the flow and get all the processors, then you'd filter on ListenHttp processors. You can also use NiPyAPI to kick off the desired flows, it is a very handy tool.

Related

NiFi How to get the current processor Name and Processor group name through the custom processor using (Java)

I'm Creating the NiFi Custom processor using Java,
one of the requirement is to get the previous processor name and processor group (like a breadcrumb) using java code.
The previous processor name and process group name is not immediately (nor meant to be) available to processors, can you explain more about your use case? You can perhaps use a SiteToSiteProvenanceReportingTask to send provenance information back to your own NiFi instance (an Input Port, e.g.) and find the events that correspond to FlowFiles entering your custom processor, the events should have the source (previous) processor and destination (your custom) processor.
If instead you code your custom processor using InvokeScriptedProcessor with Groovy for example, then you can "bend the rules" and get at the previous processor name and such, as Groovy allows access to private members and you can assume the implementation of the ProcessContext in onTrigger is an instance of StandardProcessContext, so you can get at its members which include upstream connections and thus the previous processor. For a particular FlowFile though, I'm not sure you can use this approach to know which upstream processor it came from.
Alternatively, you could add an UpdateAttribute after each "previous processor" to set attribute(s) with the information about that processor, but that has to be hardcoded and applied to every corresponding part of the flow.
I faced this some time back. I used InvokeHTTP processor and used nifi-api/process-groups/${process_group_id} Web Service
This is how I implemented:
Identify the process group where the error handling should be done. [Action Group]
Create a new process group [Error Handling Group] next to the Action Group and add relationship to transfer files to Error Handling Group.
Use the InvokeHTTP processor and set HTTP Method to GET
Set Remote URL to http://{nifi-instance}:{port}/nifi-api/process-groups/${action_group_process_group_id}
You will get response in JSON which you will have to customize according to your needs
Please let me know if you need the XML file that I am using. I can share that. It just works fine for me

In nifi usgae of Evaluate jsonpath processor will it affect performance impact because of attribute creation

I'm trying to integrate nifi REST API's with my application. So by mapping input and output from my application, I am trying to call nifi REST api for flow creation. So, in my use case most of the times I will extract the JSON values and will apply expression languages.
So, for simplifying all the use-cases I am using evaluate JSONpath processor for fetching all attributes using jsonpath and apply expression language function on that in extract processor. Below is the flow diagram regarding that.
Is it the right approach because for JSON to JSON manipulation having 30 keys this is the simplest way, and as I am trying to integrate nifi REST API's with my application I cannot generate JOLT transformation logic dynamically based on the user mapping.
So, in this case, does the usage of evaluating JSONpath processor creates any performance issues for about 50 use case with different transformation logic because as I saw in documentation attribute usage creates performance(regarding memory) issues.
Your concern about having too many attributes in memory should not be an issue here; having 30 attributes per flowfile is higher than usual, but if these are all strings between 0 - ~100-200 characters, there should be minimal impact. If you start trying to extract KB worth of data from the flowfile content to the attributes on each flowfile, you will see increased heap usage, but the framework should still be able to handle this until you reach very high throughput (1000's of flowfiles per second on commodity hardware like a modern laptop).
You may want to investigate ReplaceTextWithMapping, as that processor can load from a definition file and handle many replace operations using a single processor.
It is usually a flow design "smell" to have multiple copies of the same flow process with different configuration values (with the occasional exception of database interaction). Rather, see if there is a way you can genericize the process and populate the relevant values for each flowfile using variable population (from the incoming flowfile attributes, the variable registry, environment variables, etc.).

NiFi: Get all the processors name involved in a particular run

I have a nifi template of 30 processors. There are multiple conditional branches are there in the template. Now, I want to add something at the end of template so that I can get the list of all processors name which has executed for a particular run.
How can do this?
Thanks,
You could technically insert an UpdateAttribute processor after every "operational" processor which would add an attribute containing the most recent processor, but #Bryan is correct that the provenance feature exists to provide this information automatically. If you need to operate on it, you can use the SiteToSiteProvenanceReportingTask to send that data to a Remote Process Group (linked to an Input Port on the same instance) and then treat that data as any other in NiFi and examine/transform it.

How to conditionally process FlowFile's by a MongoDB query result?

I need to process a list of files based on the result of a MongoDB query, but I can't find any processor that would let me do that. I basically have to take each file and process it or completely discard based on the result of a query that involves that file attributes.
The only MongoDB-related processor that I see in NiFi 1.50 is GetMongo, which apparently can't receive connections, but only emit FlowFiles based on the configured parameters.
Am I looking in the wrong place?
NIFI-4827 is an Improvement Jira that aims to allow GetMongo to accept incoming flow files, the content would contain the query, and the properties will accept Expression Language. The code is still under review, but the intent is to make it available in the upcoming NiFi 1.6.0 release.
As a possible workaround in the meantime, if there is a REST API you could use InvokeHttp to make the call(s) manually and parse the result(s). Also if you have a JDBC driver for MongoDB (such as Unity), you might be able to use ExecuteSQL.

Connecting NiFi to ElasticSearch

I'm trying to solve one task and will appreciate any help - links to documentation, or links to forums, or other FAQs besides https://cwiki.apache.org/confluence/display/NIFI/FAQs, or any meaningful answer in this post =) .
So, I have the following task:
Initial part of my system collects data each 5-15 min from different DB sources. Then I remove duplicates, remove junk, combine data from different sources according to logic and then redirect it to second part of the system as several streams.
As far as I know, "NiFi" can do this task in the best way =).
Currently I can successfully get information from InfluxDB by "GetHTTP" processor. However I can't configure same kind of processor for getting information from Elastic DB with all necessary options. I'd like to receive data each 5-15 minutes for time period from "now-minus-<5-15 minutes>" to "now". (depends on scheduler period) with several additional filters. If I understand it right, this can be achieved either by subscription to "_index" or by regular requests to DB with desired interval.
I know that NiFi has several specific Processors designed for Elasticsearch (FetchElasticsearch5, FetchElasticsearchHttp, QueryElasticsearchHttp, ScrollElasticsearchHttp) as well as GetHTTP and PostHTTP Processors. However, unfortunately, I have lack of information or even better - examples - how to configure their "Properties" for my purposes =(.
What's the difference between FetchElasticsearchHttp, QueryElasticsearchHttp? Which one fits better for my task? What's the difference between GetHTTP and QueryElasticsearchHttp besides several specific fields? Will GetHTTP perform the same way if I tune it as I need?
Any advice?
I will be grateful for any help.
The ElasticsearchHttp processors try to make it easier to interact with ES by generating the appropriate REST API call based on the properties you set. If you know the full URL you need, you could use GetHttp or InvokeHttp. However the ESHttp processors let you put in just the stuff you're looking for, and it will generate the URL and return the results.
FetchElasticsearch (and its variants) is used to get a particular document when you know the identifier. This is sometimes used after a search/query, to return documents one at a time after you know which ones you want.
QueryElasticsearchHttp is for when you want to do a Lucene-style query of the documents, when you don't necessarily know which documents you want. It will only return up to the value of index.max_result_window for that index. To get more records, you can use ScrollElasticsearchHttp afterwards. NOTE: QueryElasticsearchHttp expects a query that will work as the "q" parameter of the URL. This "mini-language" does not support all fields/operators (see here for more details).
For your use case, you likely need InvokeHttp in order to issue the kind of query you describe. This article describes how to issue a query for the last 15 minutes. Once your results are returned, you might need some combination of EvaluateJsonPath and/or SplitJson to work with the individual documents, see the Elasticsearch REST API documentation (and NiFi processor documentation) for more details.

Resources