EvaluateJsonPath creates additional flowfiles - apache-nifi

So Im using EvaluateJsonPath to extract three json values to attributes but I ran into this issue where when there's one flowfile it turns into three with same filename but different UUIDs after processing through EvaluateJsonPath. Is this how it's supposed to work or am I missing something?
I have 'flowfile-attribute' setup as Destination field value

It had to be some kind of bug, I added another exactly same processor and it works with no issues.

Related

problem while using RouteOnAttribute (cannot read json attribute and always sends flow to unmatch)

while using RouteOnAttribute nifi processor , i have input of json data
[{"dev":"xyz","detail":"abc"}] which i got from convertRecord processor
Routing Strategy :Route to Property name
ifmatch: ${dev:equals( "xyz" )}
I tried ${dev:matches( "xyz")} in both single quotes and double quotes still am not getting flowfile redirecting towards "ifmatch" . its redirecting to unmatched
is there anyway to resolve this i tried many other option
The flowfile content is different from attributes. Content is arbitrary -- could be empty, text, KB of XML, GB of video or binary. Each flowfile also has attributes which are key/value pairs of Strings kept in memory.
If you want to route on this piece of data, you have multiple options:
Use RouteOnText or RouteOnContent to use the actual flowfile content directly.
Extract it to an attribute using EvaluateJsonPath and then route on that attribute.
The Apache NiFi User Guide and In-Depth provide more information around this distinction.

How to read from a CSV file

The problem:
I have a CSV file. I want to read from it, and use one of the values in it based on the content of my flow file. My flow file will be XML. I want to read the key using EvaluateXPath into an attribute, then use that key to read the corresponding value from the CSV file and put that into a flow file attribute.
I tried following this:
https://community.hortonworks.com/questions/174144/lookuprecord-and-simplecsvfilelookupservice-in-nif.html
but found requiring several controller services, including a CSV writer to be a big more than I would think is required to solve this.
Since you're working with attributes (and only one lookup value), you can skip the record-based stuff and just use LookupAttribute with a SimpleCsvFileLookupService.
The record-based components are for doing multiple lookups per record and/or lookups for each record in a FlowFile. In your case it looks like you have one "record" and you really just want to look up an attribute from another attribute for the entire FlowFile, so the above solution should be more straightforward and easier to configure.

Is there any way to make mathematical operations for some values in files with apache nifi?

I am getting some numerical data with API from URL and I am looking for a way to make some mathematical operations in apache nifi before putting data to file directory. Thanks already now.
By the way, I am using InvokeHTTP processor to get data and to put file in somewhere I am using PutFile processor. I searched some related websites but I could not find out a working way.
Try using QueryRecord processor and Define Record Reader/Writer controller services to read/write the flowfile.
Add new property to the QueryRecord processor by using Apache calcite SQL query with your mathematical operations on flowfile.
Results of the SQL query will be added to the outgoing flowfile in your desired format.
Ultimately the answer depends on whether the data you're working with is in the content of the FlowFile or in the attributes. If the data is small enough and it's only a couple operations, the suggested approach would be to work with the data as attributes and use NiFi's expression language to do the transformations.
There is a section of mathematical operations[1] in the Apache documentation[2]. The operations range from simple operand like plus/minus to exposing the java.lang.Math static methods.
[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#numbers
[2] https://nifi.apache.org/docs.html
You can try ExecuteStreamCommand if you want to intake the whole file and then run operations. Alternatively, you can fiddle around with the variables on the flowfile - depending on how large your operation is.
For example if you have some initial variables you can include them in the name of your file and then extract them, run the operations within the variables of the flowfile, then add to the bottom of the original file

NiFi putelasticsearch5 error - Content type is missing

I am using PutElasticsearch5 processor to index documents into ES.My workflow has couple of other processors before PutElasticsearch5 which converts avro to json.
I am getting the below given error when I run the workflow.
java.lang.IllegalArgumentException: Validation Failed: 1: content type is missing;2: content type is missing;
I coudlnt find any other relvant information to troubleshoot this. There is no setting for "Content Type" under Putelasticsearch5 configuration
I'm also having this issue, like user2297083 said if you are sending a batched JSON file into the PutElasticsearch5 then it will throw this exception and move the file into the FAILED relationship. The processor seems like it only handles one JSON object written into a file at a time that cannot be surrounded by array brackets. So if you have a file with content such as:
[{"key":"value"}]
then the processor will fail however if you send the same document as:
{"key":"value"}
then the processor will index successfully, considering your other configurations are correct.
One solution can be such that if you don't want to send everything through a splitter before coming to the PutElasticsearch5 processor, then use a splitter processor that works off of the FAILURE relationship to the PutElasticsearch5 and sends data back into the same PutElasticsearch5. More FlowFiles means more IO in your node, so I'm actively looking for a way to have the PutElasticsearch5 processor handle a batched JSON document. I feel like there's got to be a way without writing a custom iteration of it or creating a ton of new FlowFiles.
EDIT:
Actually, it does answer the question. His question is:
I am using PutElasticsearch5 processor to index documents into ES.My workflow has couple of other processors before PutElasticsearch5 which converts avro to json.
I am getting the below given error when I run the workflow.
java.lang.IllegalArgumentException: Validation Failed: 1: content type is missing;2: content type is missing;
which is exactly the exception message that is given by the PutElasticsearch5 processor when passing a JSON file that is not formatted correctly. His question is why this is happening.
My answer states why it's happening (one possible use case) and how to work around it by giving a solution that does work.
In this case, correctly formatted JSON means a FlowFile that has a single JSON object as it's content as I have shown above.
Further looking into this though, it makes sense that the processor only takes a single JSON document FlowFile at a time because you can use FlowFile attributes to specify the "id" of the indexed document. If the uuid of the FlowFile is used and it was a batched JSON i.e.
[{"one":1},{"two":2},{"three":3}]
then each JSON object would be indexed in elasticsearch using the same "index","type", and "id" (id being the FlowFile uuid), and this would not be desired.

Apache NiFi to split data based on condition

Our requirement is split the flow data based on condition.
We thought to use "ExecuteStreamCommand" processor for that (intern it will use java class) but it is giving single flow data file only. We would like to have two flow data files, one is for matched and another is for unmatched criteria.
I looked at "RouteText" processor but it has no feature to use java class as part of it.
Let me know if anyone has any suggestion.
I think you could use GetMongo to read those definition values and store them in a map accessed by DistributedMapCacheClientService, then use RouteOnContent to route the incoming flowfiles based on the absence/presence of the retrieved values.
If that doesn't work, you could instead route the query result from GetMongo to PutFile and then use ScanContent, which reads from a dictionary file on the file system and routes flowfiles based on the absence/presence of those keywords in the content.
Finally, if all else fails, you can use ExecuteScript to combine those steps into a single processor and route to matched/unmatched relationships. It processes Groovy code easily, so you can directly invoke your existing Java class if necessary.

Resources