How do we install Apache BigTop with Ambari? - hadoop

I am trying to find out how to deploy a hadoop cluster using ambari by using apache big top
According to the latest release bigtop 1.5:
https://blogs.apache.org/bigtop/
my understanding is that Bigtop Mpack was added as a new feature, that enables users to
deploy Bigtop components via Apache Ambari.
I am able to install the Bigtop components via command line, but do not find any documentation on how to install these bigtop hadoop components via ambari.
Can someone please help redirect me into some documentation that tells me how to install various hadoop components(bigtop packages) via ambari?
Thanks,

I'm from Bigtop community. Though I don't have a comprehensive answer. The Bigtop user mailing list had a discussion recently that has several tech details can answer your question:
https://lists.apache.org/thread.html/r8c5d8dfdee9b7d72164504ff2f2ea641ce39aa02364d60917eaa9fa5%40%3Cuser.bigtop.apache.org%3E
OTOH, you are always welcome to join the mailing list and ask questions. Our community is active and happy to answer questions.

Build a repo of Big Top
To install that repo with Ambari, you have to register the stack/version. You will need to create a version file. I found an example of one here.
Complete installation like you would with a normal build
This is highly theoretical (..haven't done this before..) I have worked with a BIGTOP Mpack before that took care of some of this work but it's not production ready yet, and works with an old version of Ambari, not the newest. (I was able to install/stop/start HDFS/Hive). These instruction above should work with any version of Ambari.

I have been able to test Matt Andruff's theory with a VM. Here was my process and where I stopped;
Built a repo of Apache BigTop 1.5.0
Built BigTop using Gradlew
Installed Apache Ambari 2.6.1 on my system
Enabled BigInsights build version xml file and modified the package version numbers to match my Bigtop build
Note: You can also build your own version file if you want as Matt mentioned
Setup a webserver to host your package repo
Point your xml version file repo to your local webserver for packages
From there you can complete the installation of your packages as you would normally.
I have only done this with a single VM thus far and will be trying to spin up a small cluster using AWS in the coming weeks.

Related

Building Amabari HDP stacks from sources

I am trying to setup Ambari + HDP from sources (since Cloudera closed off Hortonworks package repos). Can anyone share experience / howto on this? Documentation is very scarce in this regard.
#alfheim the documentation is here:
https://cwiki.apache.org/confluence/display/AMBARI/Installation+Guide+for+Ambari+2.7.5
And a post with all the details:
Ambari 2.7.5 installation failure on CentOS 7
Be sure to get the correct versions of npm, maven, node, etc. There are some manual changes you may need to make inside of the source files. You can find quite a few posts solving those issues here on the ambari tag. Go back to pages 2 or 3 to find most recent posts for Building Ambari from Source or just search any errors you may have during build.

How to build deb/rpm repos from open source Hadoop or publicly available HDP source code to be installed by ambari

I am trying to install the open source hadoop or building the HDP from source to be installed by ambari. I can see that it is possible to build the java packages for each component with the documentation available in apache repos, but how can i use those to build rpm/deb packages that are provided by hortonworks for HDP distribution to be installed by ambari.
#ShivamKhandelwal Building Ambari From Source is a challenge but one that can be accomplished with some persistence. In this post I have disclosed the commands I used recently to build Ambari 2.7.5 in centos:
Ambari 2.7.5 installation failure on CentOS 7
"Building HDP From Source" is very big task as it requires building each component separately, creating your own public/private repo which contains all the component repos or rpms for each operating system flavor. This is a monumental task which was previously managed by many employees and component contributors at Hortonworks.
When you install Ambari from HDP, it comes out of the box with their repos including their HDP stack (HDFS,Yarn,MR,Hive, etc). When you install Ambari From Source, there is no stack. The only solution is to Build Your Own Stack which is something I am expert at doing.
I am currently building a DDP stack as an example to share with the public. I started this project by reverse engineering a HDF Management Pack which includes stack structure (files/folders) to role out NiFi, Kafka, Zookeeper, and more. I have customized it to be my own stack with my own services and components (NiFi, Hue, Elasticsearch, etc).
My goal with DDP is to eventually make my own repos for the Components and Services I want, with the versions I want to install in my cluster. Next I will copy some HDP Components like HDFS,YARN,HIVE from the HDP stack directly into my DDP stack using the last free public HDP Stack (HDP 3.1.5).

How to install CM over an existing non CDH Cluster

Is it possible to install CM over an existing non CDH cluster?
For example, I have manually installed Hadoop and other services to my VMs.
Can I install CM and force it to manage my cluster?
It is doubtful you could do this since CM expects either parcels or CDH packages installed with Hadoop. If you really wanted to try to do this, it would be easier to install CM + CDH using packages then overwrite the specific artifacts in the package but this could be very tedious.
It is not possible to install Cloudera Manager on an non CDH Cluster. One of the reasons is Cloudera expects the installation to be carried out using the CDH packages or using the CDH parcels. Even CDH Packages and CDH parcels can't coexist. Other reason is that the jars bundled by Cloudera are different from the jars available in the native distributions of the software components such as Hive etc. So, It is not going to work.
Don't waste time attempting it.

Installing Spark through Ambari

I've configured cluster of VMs via Ambari.
Now trying to install Spark.
In all tutorials (i.e. here) it's pretty simple; Spark installation is similar to other services:
But it appears that in my Ambari instance there is simply no such entry.
How can I add Spark entry to Ambari services?
There should be a SPARK folder under the /var/lib/ambari-server/resources/stacks/HDP/2.2/services directory. Additionally there should be spark folders identified by their version number under /var/lib/ambari-server/resources/common-services/SPARK. Either someone modified your environment or it's a bad and or non-standard install of Ambari.
I would suggest re-installing as it is hard to say exactly what you need to add as its unclear what other things may be missing in the environment.

Getting started with Hadoop and Eclipse

I'm following a couple of tutorials for setting up Hadoop with Eclipse.
This one is from Cloudera : http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
But this seems to focus on checking out the latest code from Hadoop and tweaking it.
This is rare although, usually the latest release of Hadoop will suffice most users needs?
Whereas this tutorial seems to focus on setting up and running hadoop :
http://v-lad.org/Tutorials/Hadoop/05%20-%20Setup%20SSHD.html
I just want to run some basic map reduce jobs to get started. I don't think I should be using the latest code from Hadoop as cloudera specifies in above first link to get started ?
Here is a blog entry and screencast on developing/debugging applications in Eclipse. The procedure works with most versions of Hadoop.
You may try this tutorial on installing Hadoop plugin for eclipse: http://bigsonata.com/?p=168

Resources