Apache Nifi - create a new pipeline using API - apache-nifi

Can we create a new pipeline using Apache Nifi API without using the GUI? If yes, then please let me know the steps for the same.

The response to your question is yes, you can use:
NiFi API.
NiFi CLI from version 1.6.
NiPyApi python client thanks to #chaffelson
You can find the documentation here:
https://github.com/apache/nifi/tree/master/nifi-toolkit/nifi-toolkit-cli
https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
You can also search in the Hortonworks page, there is a lot of contain that can be helpful.

If you are familiar with Python, there is also a community Python client for NiFi.
https://github.com/Chaffelson/nipyapi
And a quick introduction here:
https://community.hortonworks.com/articles/167364/nifi-sdlc-automation-in-python-with-nipyapi-part-1.html
note: I am the primary author.

Related

Automate NiFi Deployment

I am looking for best approaches for deploying NiFi flows from my DEV environment to TEST/PROD environments.
Below links gives an overview of how we can achieve the same; basically it explains we have to make use of NiFi Cli to automate the deployment.
https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/
https://bryanbende.com/development/2018/01/19/apache-nifi-how-do-i-deploy-my-flow
But I was wondering is there an option to create a general script which can be used for deploying for different types of flows. Since the variables that we need to set for one processor is different from another one, not sure how we can do the same.
Any help is appreciated
I am the primary maintainer of NiPyAPI, a Python client for working with Apache NiFi. I have an example script covering the steps you are requesting though it is not part of the official Apache project.
https://github.com/Chaffelson/nipyapi/blob/master/nipyapi/demo/fdlc.py

How to use Druid with Ambari?

I am very new to Druid, a column-oriented, open-source, distributed data store written in Java.
I need to start multiple services (nodes) in order to work smoothly Druid. Is there a good way to auto start the services?
You can find patch for Ambari Druid integration, AMBARI-17981, and which will be included as of Ambari v2.5.
Patch file contains all that information in the form of a diff file.
Typically you need to checkout the source code, apply the patch, and then build the project.
You could use the Hortonworks Data Platform (HDP)/distribution that will install Zookeeper/HDFS/Druid/Postgresql/Hadoop and you are good to go.
There is also a video guide available on how to install Druid step-by-step.
Otherwise you can do it your self by building Druid from source and copy jars and configs around.

Issue while Fetching tweets using Flume

I'm able to fetch tweets using flume, however, the language in which it is streamed is not what I want. Below is the flume.conf file
And the tweets that I'm getting is shown below:
Can anyone suggest changes that I need to make.?
The TwitterSource in Apache Flume currently does not implement support for language filtering. This prior question describes a procedure (admittedly complex) by which you could deploy your own patched version of the code with language support:
Flume - TwitterSource language filter
I think it would be a valuable enhancement for Apache Flume to support language filtering. I encourage you to file a request in Apache JIRA in the FLUME project.
If you're interested, please also consider contributing a patch. I think it would just be a matter of pulling the "language" setting out of configuration in the configure method, saving it in a member variable, and then passing it along in the Twitter4J APIs.

How to USE the SOLR api over ElasticSearch

Am in process of analyzing the steps involved in migrating SOLR to ELasticsearch. While doing so we came across a plugin call Mock SOLR API.
The plugin which is found SOLR-MOCK (https://github.com/mattweber/elasticsearch-mocksolrplugin). But am not able to install this Could anyone please help with the steps or suggest some alternate ways to achieve this.
Thanks

Accessing Hadoop data using REST service

I am trying to update HDP architecture so data residing in Hive tables can be accessed by REST APIs. What are the best approaches how to expose data from HDP to other services?
This is my initial idea:
I am storing data in Hive tables and I want to expose some of the information through REST API therefore I thought that using HCatalog/WebHCat would be the best solution. However, I found out that it allows only to query metadata.
What are the options that I have here?
Thank you
You can very well use WebHDFS which is basically a REST Service over Hadoop.
Please see documentation below:
https://hadoop.apache.org/docs/r1.0.4/webhdfs.html
The REST API gateway for the Apache Hadoop Ecosystem is called KNOX
I would check it before explore any other options. In other words, Do you have any reason to avoid using KNOX?
What version of HDP are you running?
The Knox component has been available for quite a while and manageable via Ambari.
Can you get an instance of HiveServer2 running in HTTP mode?
This would give you SQL access through J/ODBC drivers without requiring Hadoop config and binaries (other than those required for the drivers) on the client machines.

Resources