Is there any way to make mathematical operations for some values in files with apache nifi? - apache-nifi

I am getting some numerical data with API from URL and I am looking for a way to make some mathematical operations in apache nifi before putting data to file directory. Thanks already now.
By the way, I am using InvokeHTTP processor to get data and to put file in somewhere I am using PutFile processor. I searched some related websites but I could not find out a working way.

Try using QueryRecord processor and Define Record Reader/Writer controller services to read/write the flowfile.
Add new property to the QueryRecord processor by using Apache calcite SQL query with your mathematical operations on flowfile.
Results of the SQL query will be added to the outgoing flowfile in your desired format.

Ultimately the answer depends on whether the data you're working with is in the content of the FlowFile or in the attributes. If the data is small enough and it's only a couple operations, the suggested approach would be to work with the data as attributes and use NiFi's expression language to do the transformations.
There is a section of mathematical operations[1] in the Apache documentation[2]. The operations range from simple operand like plus/minus to exposing the java.lang.Math static methods.
[1] https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#numbers
[2] https://nifi.apache.org/docs.html

You can try ExecuteStreamCommand if you want to intake the whole file and then run operations. Alternatively, you can fiddle around with the variables on the flowfile - depending on how large your operation is.
For example if you have some initial variables you can include them in the name of your file and then extract them, run the operations within the variables of the flowfile, then add to the bottom of the original file

Related

How to read from a CSV file

The problem:
I have a CSV file. I want to read from it, and use one of the values in it based on the content of my flow file. My flow file will be XML. I want to read the key using EvaluateXPath into an attribute, then use that key to read the corresponding value from the CSV file and put that into a flow file attribute.
I tried following this:
https://community.hortonworks.com/questions/174144/lookuprecord-and-simplecsvfilelookupservice-in-nif.html
but found requiring several controller services, including a CSV writer to be a big more than I would think is required to solve this.
Since you're working with attributes (and only one lookup value), you can skip the record-based stuff and just use LookupAttribute with a SimpleCsvFileLookupService.
The record-based components are for doing multiple lookups per record and/or lookups for each record in a FlowFile. In your case it looks like you have one "record" and you really just want to look up an attribute from another attribute for the entire FlowFile, so the above solution should be more straightforward and easier to configure.

How to reduce request body in Elasticsearch

Sometimes I'm facing too big Elasticsearch queries with duplicated parts with applying the same filtering structure into aggregations (for every aggregation field). Such queries are too massive for inspecting them. Is there any way to decrease request body size? A kind of aliases maybe, I need something like variables in YAML. Or maybe you could suggest something else. Thanks!
Please have a look on search templates. You'll be able to store query templates in the cluster, use variables and even build dynamic queries:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-template.html
Using this feature will reduce your request body dramatically as you'll just refer a pre registered template, providing some parameters if needed.
Repeating blocks and conditional sections are possible using mustache templating language http://mustache.github.io/mustache.5.html
Have fun!

Apach Nifi revert flowfile attribute

In my Nifi 1.3.0 dataflow the FetchElasticsearchHttp processor changes the filename attribute to its corresponding ID in the database. I was wondering if there was a way of changing it back using some of Nifi's in house processors.
I have thought about simply writing my own script to correct this but there seems to be no way of knowing what file it is so I can't just grab its name.
if I understood you correctly, you can use UpdateAttribute to copy the filename attribute to another property. There's no way to stop the processor from writing its properties, but you can surely stash it away yourself. The trick is to copy/rename before invoking the fetch processor.

How to add metadata to the document using marklogic mapreduce connector api

I wanted to write the document to marklogic database using marklogic mapreduce api, lets say here is the example. I wanted to add metadata to the document which i am writing it back to the marklogic database in the reducer -
context.write(outputURI, result);
If adding metadata to the document with mapreduce api of marklogic is possible please let me know.
For Metadata, I am assuming you are talking about the document properties fragment. For background on document properties, please see here: https://docs.marklogic.com/guide/app-dev/properties#id_19516
For use in MarkLogic mapreduce, please see here (the output classes):
https://docs.marklogic.com/guide/mapreduce/output#id_76625
I believe you need to extend/modify your example to also write content to the properties fragment using the PropertyOutputFormat class.
One of the sample applications in the same documentation is an example of saving content in the properties fragment. If, however, you would like to fast-track yourself by looking at some source code: see some examples - including writing to a document property fragment, see here: https://gist.github.com/evanlenz/2484318 - specifically LinkCountInProperty.java
Used property mapreduce.marklogic.output.content.collection with the configuration xml. Adding this property added inserted data to that collection.

Apache NiFi to split data based on condition

Our requirement is split the flow data based on condition.
We thought to use "ExecuteStreamCommand" processor for that (intern it will use java class) but it is giving single flow data file only. We would like to have two flow data files, one is for matched and another is for unmatched criteria.
I looked at "RouteText" processor but it has no feature to use java class as part of it.
Let me know if anyone has any suggestion.
I think you could use GetMongo to read those definition values and store them in a map accessed by DistributedMapCacheClientService, then use RouteOnContent to route the incoming flowfiles based on the absence/presence of the retrieved values.
If that doesn't work, you could instead route the query result from GetMongo to PutFile and then use ScanContent, which reads from a dictionary file on the file system and routes flowfiles based on the absence/presence of those keywords in the content.
Finally, if all else fails, you can use ExecuteScript to combine those steps into a single processor and route to matched/unmatched relationships. It processes Groovy code easily, so you can directly invoke your existing Java class if necessary.

Resources