Apache Nifi for data masking - apache-nifi

We are using Nifi as our main data ingestion engine. Nifi is used to ingest data from multiple sources like DB, blob storage, etc and all of the data is pushed to kafka ( with avro as serializatiton format). Now, one of the requirement is to mask the specific fields(
PII) in input data.
Is nifi a good tool to do that ?
Does it have any processor to support data masking/obfuscation ?

Nifi comes with the EncryptContent and CryptographicHashContent and CryptographicHashAttribute processors which can be used to encrypt/hash data respectively.
I would look into this first.
In addition ReplaceText could also do simple masking. An ExecuteScript processor could perform custom masking, or a combination of UpdateRecord with a ScriptedRecordSetWriter could easily mask certain fields in a record.

Related

How to convert an AVRO scheme into line protocol in order to insert data into InfluxBD with Apache Ni-Fi

I am creating a data pipeline with Apache Ni-Fi to copy data from a remote MySQL database into InfluxDB.
I use QueryDatabaseTable processor to extract the data from the MySQL database, then I use UpdateRecord to do some data transformation and I would like to use PutInfluxDB to insert the time series into my local Influx instance in Linux.
The data coming from the QueryDatabaseTable processor uses AVRO scheme and I need to convert it into line protocol by configuring which are the tags and which are the measurement values.
However, I do not find any processor that allows doing this conversion.
Any hints?
Thanks,
Bernardo
There is no built-in processor for InfluxDB Line Protocol conversions - you could write a ScriptedRecordWriter if you wanted to do it yourself, however there is a project that already implements a Line Protocol reader for NiFi here by InfluxData that seems to be active & up-to-date.
See the documentation for adding it into NiFi here

How can I transfer data from HDFS to Oracle using Apache Nifi?

I am new to Nifi and looking for information on using Nifi processors to get speed upto 100MB/s.
At first you should use getHdfs processor to retrive HDFS file as flowfile.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-hadoop-nar/1.11.4/org.apache.nifi.processors.hadoop.GetHDFS/index.html
to put data into Oracle, you can use the PutDatabaseRecord Processor :
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.11.4/org.apache.nifi.processors.standard.PutDatabaseRecord/
between them, it's depend of your requirement, you can use ExecuteGroovyScript for exemple to transform your flowfile into query.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-groovyx-nar/1.11.4/org.apache.nifi.processors.groovyx.ExecuteGroovyScript/index.html
all processor avaible : https://nifi.apache.org/docs.html

How to pass values dynamicallly from one processor to another processor using apache nifi

i want pass one processor result as input to another processor using apache NiFi.
I am geeting values from mysql using ExecuteSQL processor .i want pass this result dynamically to SelectHiveQL Processor in apache nifi.
ExecuteSQL outputs a result set as Avro. If you would like to process each row individually, you can use SplitAvro then ConvertAvroToJson, or ConvertAvroToJson then SplitJson. At that point you can use EvaluateJsonPath to extract values into attributes (for use with NiFi Expression Language), and at some point you will likely want ReplaceText where you set the content of the flow file to a HiveQL statement (for use by SelectHiveQL).

Data aggregation in Apache Nifi

I am using Apache nifi to process the data from different resources and I have independent pipelines created for each data flow. I want to combine this data to process further. Is there any way I can aggregate the data and write it to a single file. The data is present in the form of flowfiles attributes in Nifi.
You should use the MergeContent processor, which accepts configuration values for min/max batch size, etc. and combines a number of flowfiles into a single flowfile according to the provided merge strategy.

ETL lookup transformation equivalent in Nifi

I am evaluating Apache Nifi for moving data to Hive from Postgresql database. While moving the data to Hive, I am having a requirement to mask/transform some of the data attributes using a lookup table similar to what can be done using lookup transformations available in ETL tools. What options are available to achieve the same in Apache Nifi ?
You should take a look at the ReplaceText and ReplaceTextWithMapping processors. In addition, there are EncryptContent, HashAttribute, and HashContent processors available to perform standard cryptographic operations on the data.

Resources