Adding Apache NiFi to existing Hortonworks HDP Cluster - hortonworks-data-platform

I have a 6-node-cluster running Hortonworks HDP 2.5.3 and Ambari 2.4.2.0
I want to install Apache NiFi on this cluster. When looking in the documentation, the following line jumps to my eyes:
1.1. Interoperability Requirements
You cannot install HDF on a system where HDP is already installed.
I wonder how I can install NiFi on my cluster. I would like to manage it with Ambari too, if possible.
Should I just go ahead and install the standalone version of NiFi and changing the port to something else than 8080, which is in use by Ambari? The problem is that I'd have to install it on every node and this process is not automated.

Currently you can only install one stack into a given Ambari instance, and there is an HDP stack which does not include NiFi, and an HDF stack which includes NiFi, Kafka, Storm, and Ranger. So you need a second Ambari instance where you can install the HDF stack. You also can't share nodes between two Ambaris because there can only be one Ambari agent running on a node.
There might be enhancements in future Ambari releases to improve this situation, but for now if you are limited to using your 6 HDP nodes then you would have to install/manage NiFi manually using the RPM or TAR.

As of HDP 2.6.1 it is possible to install HDF components on an HDP cluster. See https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_installing-hdf-and-hdp/content/ch_install-ambari.html

Since the latest HDP 3.0, it can add HDF 3.2 and work together with NiFi

Related

how to upgrade components in ambari

I would love to have hadoop and few other packages in newer versions that the current ambari with HDP2.6.3 allows.
Is there an option for this kind of single components version upgrades?
This feature will not be ready until Ambari 3.0. See AMBARI-18678 & AMBARI-14714
Depending on what you want to upgrade, though, I wouldn't suggest it.
For example, HBase, Hive and Spark do not yet support Hadoop 3.0. The streaming components of HDP like Spark, Kafka, NiFi seem to release versions more frequently, and there are ways outside of Ambari to upgrade those.
You don't need HDP, or Ambari to manage your Hadoop, but it does make a nice package and central management component for the cluster. If you upgrade pieces on your own, you risk incompatibility.
HDP is tested as a whole unit. The Hortonworks repos that you setup in Ambari limit what component versions that are available to you, but this does not stop you from using your own repositories plus Puppet/Chef from installing additional software into your Hadoop environment. The only thing you lose at that point, is management and configuration from Ambari.
You could try to define your own Ambari Mpacks to install additional software, but make sure you have the resources to maintain it.
Ambari upgrade steps are well documented in the Hortonworks documentation, You may follow the below link for upgrading Ambari + Hadoop components
https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.0/bk_ambari-upgrade/content/upgrading_ambari.html
https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.0/bk_ambari-upgrade/content/upgrading_hdp_stack.html
All 2.6 package urls are available in the below
https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-installation/content/hdp_26_repositories.html
You can do single components (HADOOP -include both HDFS and YARN, HIVE, OOZIE etc, ) upgrades using yum or apt-get or other package managers, however Single component upgrade in the Hadoop cluster is not recommended due to dependency issues - services might fails sometimes, So it is better to have to complete HDP stack upgraded instead of upgrading individual components.
Also you need to check Amabari version compatibility in the hortonworks documents, If you planning to upgrade only Hadoop core packages without Ambari, other wide cluster monitoring might fails

Apache ambari installation

I'm trying to install ambari server + agents.
I have a doubt regarding ambari.
I tried to install ambari.
It always gets link with hortonwork
My doubt is that I have hadoop cluster of my own in Ubunu 16.0.Will ambari only work with HDP or is it possible to also make it work with custom built clusters?
Or if possible please share me detailed descriptive documentation
It's not clear where you downloaded Ambari from, but it sounds like you used the Hortonworks version of it. Not directly from https://ambari.apache.org
Ambari works with the concept of stacks. Each stack has a set of services and components. HDP is such a stack, but there are others, or you can even define your own, so yes, you can manage your own Hadoop installation components, but that really would be not much different from what Hortonworks already provides.
Besides, the HDP services and components have been tested to work together more throughly than off the shelf Hadoop installation.
If you don't want HDP components, there is also the Apache Bigtop project that provides installation packs for many Hadoop related services
Ambari expects Java and Hadoop to be installed in a certain way. I'm not sure how easy it is to setup for an existing Hadoop install.

Where to install Java on multi-node hadoop cluster?

In a multi-node hadoop cluster where there are multiple slave nodes, one master node, and one client node, where all do we need java to be installed?
Also is that we need hadoop to be installed only on the client node? I get confused after going through sites where they mention that we first need to install Java but it does not mention on which node do we need to install it.
Java is prerequisite to run Hadoop. You need to install java in all the machines even in client also.
Coming to client configuration. In client machine no need to install Hadoop. It is just to communicate with the Hadoop cluster
Check below links for more
Hadoop Client Node Configuration
https://pravinchavan.wordpress.com/2013/06/18/submitting-hadoop-job-from-client-machine/
Java is the pre-requisite to run hadoop. It should be installed on all Master and slave node.
You can refer the document for Hadoop MultiNode cluster setup for more details.
JDK should be installed on all the nodes as it is the primary requirement for Hadoop to work.
Make sure you install the same version of Java in all the nodes.
Oracle Java is preferred over openjdk

Existing Cluster monitoring by Hortonworks Ambari

I have a 10 node existing cluster in RHEL 6.6 which was prepared by plain apache Hadoop configuration XMLs. Now I wanted to check the cluster status by Ambari. Would it be possible to install Hortonworks Ambari just to monitor only not to install Hadoop.
No, Ambari must provision the cluster it's monitoring.
Ambari is designed around a Stack concept where each stack consists of several services. A stack definition is what allows Ambari to install, manage and monitor the services in the cluster.
In order for you to use Ambari with the hadoop core that you built you would have to provide your own Ambari stack definition.
Specifically in your case your existing Hadoop installation would not have the necessary alert.json descriptors used by Ambari to provide alerts for any given service.

How to install Apache Spark on HortonWorks HDP 2.2 (built using Ambari)

I successfully built a 5 node cluster of HortonWorks HDP 2.2 using Ambari.
However I don't see Apache Spark in the installed services list.
I did some research and found that Ambari does not install certain components like hue etc. ( Spark was not in that list, but I guess its not installed).
How do I do a manual install of Apache spark on my 5 node HDP 2.2?
Or should I delete my cluster and perform a fresh install without using Ambari?
Hortonworks support for Spark is arriving but not fully complete (details and blog).
Instructions for how to integrate Spark with HDP can be found here.
You could build your own Ambari Stack for Spark. I recently did just that, but I cannot share that code :(
What I can do is share a tutorial I did on how to do any stack for Ambari, including Spark. There are many interesting issues with Spark that need to be addressed and are not covered through the tutorial. Anyways hope it helps. http://bit.ly/1HDBgS6
There is also a guide from the Ambari people here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133.
1) Ambari 1.7x does not install Accumulo, Hue, Ranger, or Solr services for the HDP 2.2 Stack.
For Installing Accumulo, Hue, Knox, Ranger, and Solr services, install
HDP Manually.
2) Apache Spark 1.2.0 on YARN with HDP 2.2 : here .
3)
Spark and Hadoop: Working Together :
Standalone deployment: With the standalone deployment one can statically allocate resources on all or a subset of machines in a Hadoop cluster and run Spark side by side with Hadoop MR. The user can then run arbitrary Spark jobs on her HDFS data. Its simplicity makes this the deployment of choice for many Hadoop 1.x users.
Hadoop Yarn deployment: Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN without any pre-installation or administrative access required. This allows users to easily integrate Spark in their Hadoop stack and take advantage of the full power of Spark, as well as of other components running on top of Spark.
Spark In MapReduce : For the Hadoop users that are not running YARN yet, another option, in addition to the standalone deployment, is to use SIMR to launch Spark jobs inside MapReduce. With SIMR, users can start experimenting with Spark and use its shell within a couple of minutes after downloading it! This tremendously lowers the barrier of deployment, and lets virtually everyone play with Spark.

Resources