Using Nifi 1.11.3, I am not able to find the processor called PutHive3QL.
This processor does not show up in the "Add processor" panel. I only have PutHiveQL.
How can I add or where can I find this processor ?
The Hive 3 components are not included with the NiFi distribution due to size limitations, but they are built and published as part of the release process. For version 1.11.3, you can find the NAR here.
Related
I am having trouble using AvroParquetReader inside a Flink Application. (flink>=1.15)
Motivaton (AKA why I want to use it)
According to official doc one can read Parquet files in Flink into FileSource. However, I only want to write a function to load parquet file into Avro records without creating a DataStreamSource. In particular, I want to load parquet files into FileInputFormat which is a complete separate API (for some weird reasons). (And I could not see easily how one could cast BulkFormat or StreamFormat into it, if one dig one level deeper.)
Therefore, it would much simpler if one use org.apache.parquet.avro.AvroParquetReader to read it directly.
Error description
However, I found this error after run the Flink application locally: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found.
This is quite unexpected, since the flink-s3-hadoop-fs jar has already been loaded inside the plugin system (and the file path has already been added to HADOOP_CLASSPATH as well). So not only flink knows where it is, so should the local hadoop as well.
Comments:
Without this AvroParquetReader, the Flink app can write to S3 without problem.
The Hadoop is not a flink shaded one, but installed separately with version 2.10.
Would love to hear if you have some insights about this.
ParquetAvroReader should be able to read the parquet files without problem.
there is an official hadoop guide that has some potential fixes for the issue and can be found here. If I recall correnctly this issue was cause by some Hadoop AWS dependencies missing.
Is that possible to store external data (not NiFi flow) into NiFi Registry using REST API?
https://nifi.apache.org/docs/nifi-registry-docs/index.html
As i know, NiFi Registry designed for versioning NiFi flow. But i want to know whether it is capable of storing other data into NiFi registry and retrieve it based on versions.
As of today, it is not currently possible to store data/objects in NiFi Registry other than a NiFi Flow and its configuration (component properties, default variable values, controller services, etc).
There have been discussions about extending NiFi Registry’s storage capabilities to include other items. Often discussed is NiFi extensions, such as NAR bundles which are the archive format for components such as custom processors. This would allow custom components to be versioned in the same place as a flow and downloaded at runtime based on a flow definition rather than pre-installed on a NiFi/MiNiFi instances.
Today though, only Flows are supported. Other data or components has to be stored/versioned somewhere else.
If you have data you want to associate with a specific flow version snapshot, here is a suggestion: You could store that data externally in another service and use the flow version snapshot comment field to store a URI/link to where the associated data resides. If you use a machine parsable format such as JSON in the snapshot comment to store this URI metadata, an automated process could retrieve this data from an external system by reading this field when doing an operation involving a specific flow snapshot version.
I am very new to Druid, a column-oriented, open-source, distributed data store written in Java.
I need to start multiple services (nodes) in order to work smoothly Druid. Is there a good way to auto start the services?
You can find patch for Ambari Druid integration, AMBARI-17981, and which will be included as of Ambari v2.5.
Patch file contains all that information in the form of a diff file.
Typically you need to checkout the source code, apply the patch, and then build the project.
You could use the Hortonworks Data Platform (HDP)/distribution that will install Zookeeper/HDFS/Druid/Postgresql/Hadoop and you are good to go.
There is also a video guide available on how to install Druid step-by-step.
Otherwise you can do it your self by building Druid from source and copy jars and configs around.
In NiFi 1.2.0, using a two-node cluster, I have a simple flow with two processors:
GenerateFlowFile 1.2.0 - Generates data files
PutSFTP 1.2.0 - SCP puts files
Often after I've started both processors and let them run for a short while, I can stop the GenerateFlowFile processor, but I'm not able to stop (or start, for that matter) the PutSFTP processor. The Start and Stop items don't display in the context menu, and I can only view and not edit the processor's configuration. The PutSFTP processor's status icon indicates that it is stopped.
I'm not convinced that the behavior that I'm seeing is specific PutSFTP processors.
Why might this processor be "unstoppable"?
This isn't a direct answer to the question, but I just noticed that, when I refresh my browser, the PutSFTP process is startable again. The problem seems to lie with the Web application failing to update the processor's context menu for some reason.
I'm using Chrome 62.0.3202.94 (64-bit).
I am creating a flow of tweets using nifi and analyze them in solr but tweets coming into nifi but nothing happeing into solr.But error in nifi processor putsolrcontentstream could not connect to localhost:2181/solr cluster not found/not ready.
Putsolrcontentstream processor error:
Are you running in clustered mode?
I just set up a local (Standard mode) Solr core and in the Solr Location property I used http://localhost:8983/solr/myDemoCore. Might you be forgetting to mention the core's name?
If you haven't created a core:
cd path/to/solr/bin/
./solr create -c myDemoCore
./solr restart
Then use http://localhost:8983/solr/myDemoCore in the Solr Location property and try again.
Edit: I see that you're using Windows-- just change your path notation accordingly.