ExecuteSQL processor in Nifi returns data in avro format - oracle

Just started working with Apache Nifi. I am trying to fetch data from oracle and place it in HDFS then build an external hive table on top of it. The problem is ExecuteSQL processor returns data in avro format. Is there anyway I can get this data in a readable format?

apache nifi also has an 'ConvertAvroToJSON' processor. That might help you get it into a readable format. We also really need to just knock out the ability for our content viewer to nicely render avro data which would help as well.
Thanks
joe

Related

How to convert an AVRO scheme into line protocol in order to insert data into InfluxBD with Apache Ni-Fi

I am creating a data pipeline with Apache Ni-Fi to copy data from a remote MySQL database into InfluxDB.
I use QueryDatabaseTable processor to extract the data from the MySQL database, then I use UpdateRecord to do some data transformation and I would like to use PutInfluxDB to insert the time series into my local Influx instance in Linux.
The data coming from the QueryDatabaseTable processor uses AVRO scheme and I need to convert it into line protocol by configuring which are the tags and which are the measurement values.
However, I do not find any processor that allows doing this conversion.
Any hints?
Thanks,
Bernardo
There is no built-in processor for InfluxDB Line Protocol conversions - you could write a ScriptedRecordWriter if you wanted to do it yourself, however there is a project that already implements a Line Protocol reader for NiFi here by InfluxData that seems to be active & up-to-date.
See the documentation for adding it into NiFi here

how to configure convertrecord processor in Apache nifi so that it convert JSON to AVRO format

not able to figure out how to configure this processor to convert the incoming JSON twitter data to Avro to Put that data into a hive table
plz help
Thank you in advance.
#RishiPandey
ConvertJsonToAvro is the processor you need.

Does all of three: Presto, hive and impala support Avro data format?

I am clear about the Serde available in Hive to support Avro schema for data formats. Comfortable in using avro with hive.
AvroSerDe
for say, I have found this issue against presto.
https://github.com/prestodb/presto/issues/5009
I need to choose components for fast execution cycle. Presto and impala provide much smaller execution cycle.
So, Anyone please let me clarify that which would be better in different data formats.
Primarily, I am looking for avro support with Presto now.
However, lets consider following data formats stored on HDFS:
Avro format
Parquet format
Orc format
Which is the best to use with high performance on different data formats.
?? please suggest.
Impala can read Avro data but can not write it. Please refer to this documentaion page describing the file formats supported by Impala.
Hive supports both reading and writing Avro files.
Presto's Hive Connector supports Avro as well. Thanks to David Phillips for pointing out this documentaion page.
There are different benchmarks on the internet about performance, but I would not like to link to a specific one as results heavily depend on the exact use case benchmarked.

Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

I want to create a following workflow:
1.Fetch tweets using GetTwitter processor.
Merge tweets in a bigger file using MergeContent process.
Store merged files in HDFS.
On the hadoop/hive side I want to create an external table based on these tweets.
There are examples how to do this but what I am missing is how to configure MergeContent processor: what to set as header,footer and demarcator.
And what to use on on hive side as separator so thatit will split merged tweets in rows.
Hope I described myself clearly.
Thanks in advance.
MergeContent processor in binary mode does the job fine. No need for header,footer and demarcator.

Is there a way to access avro data stored in hbase using hive to do analysis

My Hbase table has rows that contain both serialized avro (put there using havrobase) and string data. I know that Hive table can be mapped to avro data stored in hdfs to do data analysis but I was wondering if anyone has tried to map hive to hbase table(s) that contains avro data. Basically I need to be able to query both avro and non avro data stored in Hbase, do some analysis and store the result in a different hbase table. I need the capability to do this as a batch job as well. I don't want to write a JAVA MapReduce job to do this because we have constantly changing configurations and we need to use a scripted approach. Any suggestions? Thanks in advance!
You can write an HBase co-processor to expose the avro record as regular HBase qualifiers. You can see an implementation of that in Intel's panthera-dot

Resources