Automate NiFi Deployment - apache-nifi

I am looking for best approaches for deploying NiFi flows from my DEV environment to TEST/PROD environments.
Below links gives an overview of how we can achieve the same; basically it explains we have to make use of NiFi Cli to automate the deployment.
https://pierrevillard.com/2018/04/09/automate-workflow-deployment-in-apache-nifi-with-the-nifi-registry/
https://bryanbende.com/development/2018/01/19/apache-nifi-how-do-i-deploy-my-flow
But I was wondering is there an option to create a general script which can be used for deploying for different types of flows. Since the variables that we need to set for one processor is different from another one, not sure how we can do the same.
Any help is appreciated

I am the primary maintainer of NiPyAPI, a Python client for working with Apache NiFi. I have an example script covering the steps you are requesting though it is not part of the official Apache project.
https://github.com/Chaffelson/nipyapi/blob/master/nipyapi/demo/fdlc.py

Related

Apache NiFi deployment in production

I am new to Apache NiFi. From the documentation, I could understand that NiFi is a framework with a drag and drop UI that helps to build data pipelines. This NiFi flow can be exported into a template which is then saved in Git.
Are we supposed to import this template into production NiFi server? Once imported, are we supposed to manually start all the processors via the UI? Please help.
Templates are just example flows to share with people and are not really meant for deployment. Please take a look at NiFi Registry and the concept of versioned flows.
https://nifi.apache.org/registry.html
https://www.youtube.com/watch?v=X_qhRVChjZY&feature=youtu.be
https://bryanbende.com/development/2018/01/19/apache-nifi-how-do-i-deploy-my-flow
Template is xml representation of your process structure (processors, processor groups, controllers, relationships etc). You can upload it to another nifi server for deploy. You also can start all the processors by nifi-api.

How to use Druid with Ambari?

I am very new to Druid, a column-oriented, open-source, distributed data store written in Java.
I need to start multiple services (nodes) in order to work smoothly Druid. Is there a good way to auto start the services?
You can find patch for Ambari Druid integration, AMBARI-17981, and which will be included as of Ambari v2.5.
Patch file contains all that information in the form of a diff file.
Typically you need to checkout the source code, apply the patch, and then build the project.
You could use the Hortonworks Data Platform (HDP)/distribution that will install Zookeeper/HDFS/Druid/Postgresql/Hadoop and you are good to go.
There is also a video guide available on how to install Druid step-by-step.
Otherwise you can do it your self by building Druid from source and copy jars and configs around.

Synchronizing Ambari cluster configurations

We have been exploring Apache Ambari with HDP 2.2 to setup a cluster. Our backend features three environments: testing, staging and production which is a standard practice in our industry.
When we would deploy a cluster in the testing environment with Ambari, what is the easiest way to have the same cluster configuration on the staging, and later, production environment ?
The initial step seems easy: you create a cluster in the testing environment using the UI and then you export the configuration as a blueprint. Subsequently, you use the exported blueprint to create a new cluster in the other environments. So far, so good.
Inevitably, we will need to change our Ambari configuration (e.g. deploy a new service, increase heap size for the JVM's,...). I was hoping we could just update the blueprint (using the UI or by hand) and then use the updated blueprint to also update the different clusters. However, this seems not possible unless you destroy and recreate the cluster which seems a bit harsh.. (we don't want to lose our data) ?
Alternatively we could use the REST API of Ambari to do specific updates to the configuration but as configuration changes with respect to the initial blueprint will undoubtedly accumulate, this will prove unwieldy and unmaintainable over time, I am afraid.
Can you suggest us a better solution for this use case?
I believe the easiest way would be to dump each services configuration to a file. Then import each of those configurations into the other clusters. This could be done simply by using the Ambari API or by using the script provided by Ambari to update configurations (/var/lib/ambari-server/resources/scripts/configs.sh).

Setting up Hadoop in a public cloud

As a part of my college project, I would like to modify Hadoop's source code. However, the problem is that I would need atleast 20 systems to test it. Is it possible to setup this modified version of Hadoop in public clouds such as Google Cloud platform or Amazon Services?Can you give me an idea on the procedure to follow?I could only find information about setting up the original Hadoop versions in the public cloud set up. I couldn't find any information that is relevant to my case.Please do help me out.
Amazon offers elastic mapreduce. But as you correctly pointed out you will not be able to deploy your version of hadoop there.
But you still can use Amazon or Google cloud to just get the base linux servers and install your hadoop on it. It is just a longer process but not different from any other hadoop installation if you have done it before.

Sensu AWS plugin to get ec2-metrics which are under a load balancer

I have been trying to write a aws sensu plugin which will get the instance id's of all the healthy instances which are under a load balancer and then get the stats for each of the instances like CPU Utilization Network In and Network Out etc and using graphite and graphane generate graphs.
I was searching the open source plugins in the sensu community, I could not find any. Is it possible write the script or plugin for this. Or anyone has done it before??
Kindly help me out
I don't believe a Sensu-specific plugin exists for this. However, since Sensu can run any Nagios plugin, you could use one of those: This one looks like it would get basic information on how many hosts are healthy. You could also write your own plugin using your language of choice (check out the available SDKs) to get more detailed metrics for each of the instances.
I wrote a plugin to do the same. It use to work fine then. I have testing on newer version of API. Let me know if you face any problem. I will help to fix the same.

Resources