How to pass values dynamicallly from one processor to another processor using apache nifi - hadoop

i want pass one processor result as input to another processor using apache NiFi.
I am geeting values from mysql using ExecuteSQL processor .i want pass this result dynamically to SelectHiveQL Processor in apache nifi.

ExecuteSQL outputs a result set as Avro. If you would like to process each row individually, you can use SplitAvro then ConvertAvroToJson, or ConvertAvroToJson then SplitJson. At that point you can use EvaluateJsonPath to extract values into attributes (for use with NiFi Expression Language), and at some point you will likely want ReplaceText where you set the content of the flow file to a HiveQL statement (for use by SelectHiveQL).

Related

Apache Nifi for data masking

We are using Nifi as our main data ingestion engine. Nifi is used to ingest data from multiple sources like DB, blob storage, etc and all of the data is pushed to kafka ( with avro as serializatiton format). Now, one of the requirement is to mask the specific fields(
PII) in input data.
Is nifi a good tool to do that ?
Does it have any processor to support data masking/obfuscation ?
Nifi comes with the EncryptContent and CryptographicHashContent and CryptographicHashAttribute processors which can be used to encrypt/hash data respectively.
I would look into this first.
In addition ReplaceText could also do simple masking. An ExecuteScript processor could perform custom masking, or a combination of UpdateRecord with a ScriptedRecordSetWriter could easily mask certain fields in a record.

How can I transfer data from HDFS to Oracle using Apache Nifi?

I am new to Nifi and looking for information on using Nifi processors to get speed upto 100MB/s.
At first you should use getHdfs processor to retrive HDFS file as flowfile.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.11.4/org.apache.nifi.processors.hadoop.GetHDFS/index.html
to put data into Oracle, you can use the PutDatabaseRecord Processor :
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.11.4/org.apache.nifi.processors.standard.PutDatabaseRecord/
between them, it's depend of your requirement, you can use ExecuteGroovyScript for exemple to transform your flowfile into query.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-groovyx-nar/1.11.4/org.apache.nifi.processors.groovyx.ExecuteGroovyScript/index.html
all processor avaible : https://nifi.apache.org/docs.html

How to ingest multiple record JSON array into SQL Server using PutDatabaseReader

The problem is that I am not able to process a bunch of JSON-records which came as output of the QueryCassandra processor. I am able to process record by record using the splitjson processor before PutDatabaseRecord.
I am trying to use Jsonpathreader in PutDatabaseRecord. **How can I configure the PutDatabaseRecord processor or the Jsonpathreader in order to process all records of the JSON at once?

How can I filter flow files upon the result of an SQL query?

Would it be possible to route flow files according to the result of an SQL query which returns a single row result? For example, if the result is '1' the flow file will be processed; otherwise, it will be ignored.
Solution
The following approach worked best for me.
Use ExecuteSQL processor in order to run filtering SQL query. The query was written to produce either a single record (match) or an empty record set (no match) in a way suggested by Shu.
Connect ExecuteSQL to RouteOnAttribute processor in order to filter out unmatched flow files using the following value of routing property value ${executesql.row.count:replaceNull(0):gt(0)}
Notice, that the original content of a flow file will be lost after applying ExecuteSQL. It's not an issue in my case, because I do filtering before processing flow file content and my SQL query is based entirely on the flow file attributes and not on its content. Though in a more general scenario, when the flow file content is modified by the incoming part of the flow, one should save file content somewhere (e.g. file system) and restore it after the filtering part has applied.
You can add where clause in your sql query where <field_name> = 1 then we are only going to have output a flowfile when the result value =1.
(or)
Checking the data in NiFi:
We are going to have AVRO format data as the result of SQL query so you can use
option1:ConvertAvroToJson Processor:
Convert the AVRO data into JSON format then extract the value from the json content as attribute using EvaluateJsonPath processor.
Then use RouteOnAttribute processor add new property using NiFi expression language equals function compare the value and route the flowfile to matched relation.
Refer to this link more details regards to EvaluateJsonpath and RouteOnAttribute processor configs.
option2: Using QueryRecord processor:
By using QueryRecord processor we can run SQL queries on the content of the flowfile
Add new property to the processor as
select * from FLOWFILE where <filed_name> =1
Feed the property relation to the other processor
Refer to this link for more details regarding QueryRecord processor usage.

Data aggregation in Apache Nifi

I am using Apache nifi to process the data from different resources and I have independent pipelines created for each data flow. I want to combine this data to process further. Is there any way I can aggregate the data and write it to a single file. The data is present in the form of flowfiles attributes in Nifi.
You should use the MergeContent processor, which accepts configuration values for min/max batch size, etc. and combines a number of flowfiles into a single flowfile according to the provided merge strategy.

Resources