How to convert an AVRO scheme into line protocol in order to insert data into InfluxBD with Apache Ni-Fi - apache-nifi

I am creating a data pipeline with Apache Ni-Fi to copy data from a remote MySQL database into InfluxDB.
I use QueryDatabaseTable processor to extract the data from the MySQL database, then I use UpdateRecord to do some data transformation and I would like to use PutInfluxDB to insert the time series into my local Influx instance in Linux.
The data coming from the QueryDatabaseTable processor uses AVRO scheme and I need to convert it into line protocol by configuring which are the tags and which are the measurement values.
However, I do not find any processor that allows doing this conversion.
Any hints?
Thanks,
Bernardo

There is no built-in processor for InfluxDB Line Protocol conversions - you could write a ScriptedRecordWriter if you wanted to do it yourself, however there is a project that already implements a Line Protocol reader for NiFi here by InfluxData that seems to be active & up-to-date.
See the documentation for adding it into NiFi here

Related

How to do data transformation using Apache NiFi standrad processor?

I have to do data transfomration using Apache NiFi standard processor for below mentioned input data. I have to add two new fields class and year and drop extra price fields.
Below are my input data and transformed data.
Input data
Expected output
Disclaimer: I am assuming that your input headers are not dynamic, which means that you can maintain a predictable input schema. If that is true, you can do this with the standard processors as of 1.12.0, but it will require a little work.
Here's a blog post of mine about how to use ScriptedTransformRecord to take input from one schema, build a new data structure and mix it with another schema. It's a bit involved.
I've used that methodology recently to convert a much larger set of data into summary records, so I know it works. The summary of what's involved is this:
Create two schemas, one that matches input and one for output.
Set up ScriptedTransformRecord to use a writer that explicitly sets which schema to use since ScriptedTransformRecord doesn't support the ability to change the schema configuration internally.
Create a fat jar with Maven or Gradle that compiles your Avro schema into an object that can be used with the NiFi API to expose a static RecordSchema (NiFi API) to your script.
Write a Groovy script that generates a new MapRecord.

How to convert Avro to an SQL batch update?

I'm trying to convert an inputs Avro (array of Avro records) into a batch of upsert statemenets. Is there a processor that can do this?
[Read External DB (RDBMS)]->[Avro to Upsert batch]->[Local DB]
What I found is that the records can be formatted with sql.N.args.type & name before the PutSQL. With this approach, is there a processor or a trick that can make this clean?
[Read External DB (RDBMS)]->[Split into 1]->[Convert Avro to
sql.N.args.type and name format]->[SetAttribute:
sql.statement=SQL]->[Local DB]
In the 2nd case I'm stuck at [Convert Avro to sql.N.args.type and name format] and I'm trying to resist the urge to use ExecuteScript... What is the simplest way forward?
If you need to generate SQL (to do an upsert vs an insert for example), you could use ConvertJSONToSQL (assuming your content is JSON) which does all the sql.args.N stuff for you. If you use ExecuteSQLRecord or QueryDatabaseTableRecord you can get your source DB information as JSON (by using a JsonRecordSetWriter) vs the non-record-based versions which only output Avro. Otherwise you'd need a ConvertAvroToJSON before ConvertJSONToSQL.

Apache Nifi for data masking

We are using Nifi as our main data ingestion engine. Nifi is used to ingest data from multiple sources like DB, blob storage, etc and all of the data is pushed to kafka ( with avro as serializatiton format). Now, one of the requirement is to mask the specific fields(
PII) in input data.
Is nifi a good tool to do that ?
Does it have any processor to support data masking/obfuscation ?
Nifi comes with the EncryptContent and CryptographicHashContent and CryptographicHashAttribute processors which can be used to encrypt/hash data respectively.
I would look into this first.
In addition ReplaceText could also do simple masking. An ExecuteScript processor could perform custom masking, or a combination of UpdateRecord with a ScriptedRecordSetWriter could easily mask certain fields in a record.

Kafka Connect- Modifying records before writing into sink

I have installed Kafka connect using confluent-4.0.0
Using hdfs connector I am able to save Avro records received from Kafka topic to hive.
I would like to know if there is any way to modify the records before writing into hdfs sink.
My requirement is to do small modifications to values of the record. For Example, performing arithmetic operations on integers or manipulation of strings etc.
Please suggest if there any way to achieve this
You have several options.
Single Message Transforms, which you can see in action here. Great for light-weight changes as messages pass through Connect. Configuration-file based, and extensible using the provided API if there's not an existing transform that does what you want.
See the discussion here on when SMT are suitable for a given requirement.
KSQL is a streaming SQL engine for Kafka. You can use it to modify your streams of data before sending them to HDFS. See this example here.
KSQL is built on the Kafka Stream's API, which is a Java library and gives you the power to transform your data as much as you'd like. Here's an example.
Take a look at Kafka connect transformers [1] & [2]. You can build a custom transformer library and use it in connector.
[1] http://kafka.apache.org/documentation.html#connect_transforms
[2] https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

ExecuteSQL processor in Nifi returns data in avro format

Just started working with Apache Nifi. I am trying to fetch data from oracle and place it in HDFS then build an external hive table on top of it. The problem is ExecuteSQL processor returns data in avro format. Is there anyway I can get this data in a readable format?
apache nifi also has an 'ConvertAvroToJSON' processor. That might help you get it into a readable format. We also really need to just knock out the ability for our content viewer to nicely render avro data which would help as well.
Thanks
joe

Resources