Apache NiFi to split data based on condition - apache-nifi

Our requirement is split the flow data based on condition.
We thought to use "ExecuteStreamCommand" processor for that (intern it will use java class) but it is giving single flow data file only. We would like to have two flow data files, one is for matched and another is for unmatched criteria.
I looked at "RouteText" processor but it has no feature to use java class as part of it.
Let me know if anyone has any suggestion.

I think you could use GetMongo to read those definition values and store them in a map accessed by DistributedMapCacheClientService, then use RouteOnContent to route the incoming flowfiles based on the absence/presence of the retrieved values.
If that doesn't work, you could instead route the query result from GetMongo to PutFile and then use ScanContent, which reads from a dictionary file on the file system and routes flowfiles based on the absence/presence of those keywords in the content.
Finally, if all else fails, you can use ExecuteScript to combine those steps into a single processor and route to matched/unmatched relationships. It processes Groovy code easily, so you can directly invoke your existing Java class if necessary.

Related

How can we use UpdateAttribute processor for assigning variable of FlowFile content in Apache Nifi

I need some help in UpdateAttribute processor:
I have a CSV file which contains hostnames. I need to separate each hostname in the FlowFile and pass it as a variable to the REST API.
My REST API part is working fine when passing data manually. However, I didn't get how to pass a variable value as hostname in it.
Sharing sample file:
SRHAPP001,SRHWEBAPP002,SRHDB006,SRHUATAPP4,ARHUATDB98
I don't quite understand your goal, but I assume that you try to pass the hostname to your REST API module by using FlowFile variables.
You can achieve this by using the ExtractText-Processor. You simply use RegEx for separating your hostnames from the CSV file.
For more information, see
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.ExtractText/
How can I extract a substring from a flowfile data in Nifi?
If needed, you can split incoming FlowFiles on every hostname by using SplitContent-Processor

How to read from a CSV file

The problem:
I have a CSV file. I want to read from it, and use one of the values in it based on the content of my flow file. My flow file will be XML. I want to read the key using EvaluateXPath into an attribute, then use that key to read the corresponding value from the CSV file and put that into a flow file attribute.
I tried following this:
https://community.hortonworks.com/questions/174144/lookuprecord-and-simplecsvfilelookupservice-in-nif.html
but found requiring several controller services, including a CSV writer to be a big more than I would think is required to solve this.
Since you're working with attributes (and only one lookup value), you can skip the record-based stuff and just use LookupAttribute with a SimpleCsvFileLookupService.
The record-based components are for doing multiple lookups per record and/or lookups for each record in a FlowFile. In your case it looks like you have one "record" and you really just want to look up an attribute from another attribute for the entire FlowFile, so the above solution should be more straightforward and easier to configure.

Is there any way to make mathematical operations for some values in files with apache nifi?

I am getting some numerical data with API from URL and I am looking for a way to make some mathematical operations in apache nifi before putting data to file directory. Thanks already now.
By the way, I am using InvokeHTTP processor to get data and to put file in somewhere I am using PutFile processor. I searched some related websites but I could not find out a working way.
Try using QueryRecord processor and Define Record Reader/Writer controller services to read/write the flowfile.
Add new property to the QueryRecord processor by using Apache calcite SQL query with your mathematical operations on flowfile.
Results of the SQL query will be added to the outgoing flowfile in your desired format.
Ultimately the answer depends on whether the data you're working with is in the content of the FlowFile or in the attributes. If the data is small enough and it's only a couple operations, the suggested approach would be to work with the data as attributes and use NiFi's expression language to do the transformations.
There is a section of mathematical operations[1] in the Apache documentation[2]. The operations range from simple operand like plus/minus to exposing the java.lang.Math static methods.
[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#numbers
[2] https://nifi.apache.org/docs.html
You can try ExecuteStreamCommand if you want to intake the whole file and then run operations. Alternatively, you can fiddle around with the variables on the flowfile - depending on how large your operation is.
For example if you have some initial variables you can include them in the name of your file and then extract them, run the operations within the variables of the flowfile, then add to the bottom of the original file

Apache Nifi EvaluateJsonPath with Multiple Inputs

I have JSON objects coming into Nifi via MQTT from two different inputs - for instance, let's say one is from a top sensor, and one is from a bottom sensor. Each of the sensors has its own MQTT topic, so I am using two different ConsumeMQTT Processors to ingest this data into my Nifi Flow.
JSON Object for top sensor is {"Top_Data": "value"}
JSON Object for bottom sensor is {"Bottom_Data": "value"}
I am currently using two separate EvaluateJsonPath Processors to store either the value of Top_Data or Bottom_Data in an attribute called sensorData.
How can I use some kind of if/or statement to only use one processor to EvaluateJsonPath for both of the JSON objects I could get from MQTT? Basically, I want to have an expression that says "If my JSON object has a property called Top_Data, use its value for the attribute sensorData, otherwise, use the value from the property Bottom_Data."
Example of my EvaluateJsonPath Processor
maybe try JSONPath expression
$[Top_Data,Bottom_Data]
in the single EvaluateJSONPathProcessor.
According to https://goessner.net/articles/JsonPath/ there is a possibility to use alternate operator [,]:
[,] Union operator in XPath results in a combination of node sets. JSONPath allows alternate names or array indices as a set.
I have tested the expression using http://jsonpath.com/ and it should work.
Let us know if that helps.
You could try extracting them both using EvaluateJsonPath(property 1: top: $['top'], property 2: bottom: $['bottom']) and of course don't forget to set Destination to flowfile-attribute.
Then, transfer to UpdateAttribute and set property finalData as ${top:isEmpty():ifElse(${bottom}, ${top})}.
If EvaluateJsonPath won't find a full element, then it will set it as empty string, so all you need to do is check if either of them is empty and if it is, set the final data as the other one.

merging of flow files in the specified order

I am new to nifi(using version 1.8.0). I have the requirement of consuming kafka messages which contain vehicle position in the form of lat,lon per message. Since each message will arrive as a flow file, I need to merge all these flow files and make a json file containing the complete path followed by the vehicle. I am using consume kafka processor to subscribe to messages, update attribute processor(properties added are filename:${getStateValue("seq")},seq:${getStateValue("seq"):plus(1)}) to add a sequence number as filename (eg. filename is 1,2, 3 etc) and put file processor to write these files in the specified directory. I have configured FIFO priority queue on all the success relationship between the above mentioned processors.Once, I have received all the messages I want to merge all the flow files. For this I know I have to use get file, enforce order, merge content(merge strategy:bin packing algorithm, merge format:binary concatenation) and put file processor, respectively. Is my approach correct? How should I establish that merging of files takes place in the sequence of their names as filename is a seq number. What should I put in order attribute in enforce order processor?What should in put in group identifier? Are there more custom fields to be added in enforce order processor?
EnforceOrder processor documentation
1.Group Identifier
This property evaluate on each flowfile for your case use UpdateAttribute Processor, add group_name attribute and use the same ${group_name} attribute in Group Identifier property value.
2.Order Attribute
Expression language is not supported.
You can use filename (or) create new attribute in
UpdateAttribute processor and use same attribute name in your
Order Attribute property value.
For reference/usage of Enforce order processor use this template and upload to your NiFi instance.

Resources