HDP cluster how define disk partition? - hadoop

When installing HDP cluster, I tried to use the best practice from Hortonworks for how to define the disk partition using the following procedure: documentation
I am using Ambari 2.0 for installing my stack. The Hadoop default installation path is /hadoop/xxx.
Is there a way to tell Ambari to use the best practice from the link above instead of /hadoop?
If not what is the best way to do it manually?

The recommended practices is to setup the environment before you start installing HDP via ambari. Ambari will not perform any of the activities for you as mentioned in the resource you have shared.

Related

Apache ambari installation

I'm trying to install ambari server + agents.
I have a doubt regarding ambari.
I tried to install ambari.
It always gets link with hortonwork
My doubt is that I have hadoop cluster of my own in Ubunu 16.0.Will ambari only work with HDP or is it possible to also make it work with custom built clusters?
Or if possible please share me detailed descriptive documentation
It's not clear where you downloaded Ambari from, but it sounds like you used the Hortonworks version of it. Not directly from https://ambari.apache.org
Ambari works with the concept of stacks. Each stack has a set of services and components. HDP is such a stack, but there are others, or you can even define your own, so yes, you can manage your own Hadoop installation components, but that really would be not much different from what Hortonworks already provides.
Besides, the HDP services and components have been tested to work together more throughly than off the shelf Hadoop installation.
If you don't want HDP components, there is also the Apache Bigtop project that provides installation packs for many Hadoop related services
Ambari expects Java and Hadoop to be installed in a certain way. I'm not sure how easy it is to setup for an existing Hadoop install.

Can I setup plain hadoop not HDP with apache ambari

I have seen many tutorials for Ambari, all uses HDP stack from hortonworks so I want to know I can setup plain hadoop cluster without HDP.
Can I setup Hadoop using Ambari without HDP?
To set up a 'plain' Hadoop cluster, you will have to define your own stack definition to work with Ambari.
Documentation on how to setup your own stack and services can be found here here.
Additional Note: There are efforts from ODPi to make management of custom stacks and services through Ambari standardized.

How to intergrate hadoop using ambari without HDP?

I have a hadoop cluster with apache hadoop 2.0.7.
I want to know how to integrate Ambari with the apache hadoop without the HDP(HortonWorks).
Actually, If I use HDP the solution is easy. but , I don't want to use the in my situation.
Do you have an any Idea?
Ambari relies on 'Stack' definitions to describe what services the Hadoop cluster consists of. Hortonworks defined a custom Ambari stack, its called HDP.
You could define your own stack and use any services and respective versions that you wanted. See the ambari wiki for more information about defining stacks and services.
That being said, I don't think it's possible to use your pre-existing installation of Hadoop with Ambari. Ambari is used to provision and manage hadoop clusters. It keeps track of the state of each of its stacks services, and the states of each services components. Since your cluster is already provisioned it would be difficult (maybe impossible) to add it to an Ambari instance.

How to install Apache Spark on HortonWorks HDP 2.2 (built using Ambari)

I successfully built a 5 node cluster of HortonWorks HDP 2.2 using Ambari.
However I don't see Apache Spark in the installed services list.
I did some research and found that Ambari does not install certain components like hue etc. ( Spark was not in that list, but I guess its not installed).
How do I do a manual install of Apache spark on my 5 node HDP 2.2?
Or should I delete my cluster and perform a fresh install without using Ambari?
Hortonworks support for Spark is arriving but not fully complete (details and blog).
Instructions for how to integrate Spark with HDP can be found here.
You could build your own Ambari Stack for Spark. I recently did just that, but I cannot share that code :(
What I can do is share a tutorial I did on how to do any stack for Ambari, including Spark. There are many interesting issues with Spark that need to be addressed and are not covered through the tutorial. Anyways hope it helps. http://bit.ly/1HDBgS6
There is also a guide from the Ambari people here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133.
1) Ambari 1.7x does not install Accumulo, Hue, Ranger, or Solr services for the HDP 2.2 Stack.
For Installing Accumulo, Hue, Knox, Ranger, and Solr services, install
HDP Manually.
2) Apache Spark 1.2.0 on YARN with HDP 2.2 : here .
3)
Spark and Hadoop: Working Together :
Standalone deployment: With the standalone deployment one can statically allocate resources on all or a subset of machines in a Hadoop cluster and run Spark side by side with Hadoop MR. The user can then run arbitrary Spark jobs on her HDFS data. Its simplicity makes this the deployment of choice for many Hadoop 1.x users.
Hadoop Yarn deployment: Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN without any pre-installation or administrative access required. This allows users to easily integrate Spark in their Hadoop stack and take advantage of the full power of Spark, as well as of other components running on top of Spark.
Spark In MapReduce : For the Hadoop users that are not running YARN yet, another option, in addition to the standalone deployment, is to use SIMR to launch Spark jobs inside MapReduce. With SIMR, users can start experimenting with Spark and use its shell within a couple of minutes after downloading it! This tremendously lowers the barrier of deployment, and lets virtually everyone play with Spark.

How to deploy ambari for an existing hadoop cluster

As I mention in this title, can I skip the step of install hadoop cluster for that cluster already exist and which in service?
Ambari relies on 'Stack' definitions to describe what services the Hadoop cluster consists of. Hortonworks defined a custom Ambari stack, its called HDP.
You could define your own stack and use any services and respective versions that you wanted. See the ambari wiki for more information about defining stacks and services.
That being said, I don't think it's possible to use your pre-existing installation of Hadoop with Ambari. Ambari is used to provision and manage hadoop clusters. It keeps track of the state of each of its stacks services, and the states of each services components. Since your cluster is already provisioned it would be difficult (maybe impossible) to add it to an Ambari instance.
One of the minimum requierments of installing Ambari is removing the pre-existing installations of tools mentioned here.It is not mentioned to remove any pre-existing hadoop installation.

Resources