Cloudera Hue alternative for Hortonworks - hadoop

Like hue is used to deploy oozie jobs using oozie editor, what alternative do we have, when using hortonworks ambari? I want to deploy oozie jobs but also want to avoid oozie cli client.

Latest versions of Ambari support the Oozie Workflow Manager.
https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.1.0/bk_workflow-management/content/ch_wfm_basics.html
Or, you could download and install/configure Hue on your cluster; Ambari doesn't need to be the central configuration component of the cluster

Related

YARN-Specify Which Application to be Run on Which Nodemanager

I have a Hadoop YARN cluster including one resourcemanager and 6 nodemanagers. I want to run both Flink and Spark applications on the cluster. So I have two main question about YARN:
In case of Spark, Should I install and config Spark on resource manager and each nodemanagers? When I want to submit a Spark application on YARN, in addition to YARN resourcemanager and nodemanagers, should Spark cluster (master and slaves) be run?
Can I set YARN such that run Flink in some special nodemanagers?
Thanks
For the first question, that depends on whether you're using a packaged Hadoop distribution (like Cloudera CDH, Hortonworks HDP for example) or not. The distros will likely take care of this. If you're not using a distribution, you need to consider if you want to run Spark on YARN or Spark stand-alone.
For the second question, you can specify special Node Managers if you are using Capacity Scheduler with the node-labelling feature enabled and if you are using Hadoop 2.6 and higher.

Installing Hue on a Ambari cluster

I have a Ambari cluster to manage my hadoop/spark jobs. I want to schedule my workflows using oozie editor. Hue is the most popular and easy to use one. How do I install hue on top of an existing hadoop cluster managed by Ambari.
Thanks
Hue is a service created by Cloudera. You cannot install it using Ambari, but you can download a package with Hue and install it based on official documentation. You should check this article - Installing Hue 3.9 on HDP 2.3

installation of Oozie on a separate machine then Hadoop

Very new to Oozie, hence please excuse me if I sound like a newbie.
I have a hadoop cluster which is up and running. I want to install Oozie, this i want on a separate machine then then hadoop. Is this possible? the reason for asking is that on every installation guide I have seen it asks to install hadoop on the machine hence am not sure if its technically possible to have hadoop on a separate machine then Oozie.
Thanks in advance
Oozie server serves client's requests, it's a web application which uses embedded Tomcat, it can be installed on any machine where hadoop is reachable from, it's not tied to hadoop by itself. You can specify hadoop's nameNode and jobTracker in workflow properties so oozie will know where to send it's jobs.

How to install Apache Spark on HortonWorks HDP 2.2 (built using Ambari)

I successfully built a 5 node cluster of HortonWorks HDP 2.2 using Ambari.
However I don't see Apache Spark in the installed services list.
I did some research and found that Ambari does not install certain components like hue etc. ( Spark was not in that list, but I guess its not installed).
How do I do a manual install of Apache spark on my 5 node HDP 2.2?
Or should I delete my cluster and perform a fresh install without using Ambari?
Hortonworks support for Spark is arriving but not fully complete (details and blog).
Instructions for how to integrate Spark with HDP can be found here.
You could build your own Ambari Stack for Spark. I recently did just that, but I cannot share that code :(
What I can do is share a tutorial I did on how to do any stack for Ambari, including Spark. There are many interesting issues with Spark that need to be addressed and are not covered through the tutorial. Anyways hope it helps. http://bit.ly/1HDBgS6
There is also a guide from the Ambari people here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133.
1) Ambari 1.7x does not install Accumulo, Hue, Ranger, or Solr services for the HDP 2.2 Stack.
For Installing Accumulo, Hue, Knox, Ranger, and Solr services, install
HDP Manually.
2) Apache Spark 1.2.0 on YARN with HDP 2.2 : here .
3)
Spark and Hadoop: Working Together :
Standalone deployment: With the standalone deployment one can statically allocate resources on all or a subset of machines in a Hadoop cluster and run Spark side by side with Hadoop MR. The user can then run arbitrary Spark jobs on her HDFS data. Its simplicity makes this the deployment of choice for many Hadoop 1.x users.
Hadoop Yarn deployment: Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN without any pre-installation or administrative access required. This allows users to easily integrate Spark in their Hadoop stack and take advantage of the full power of Spark, as well as of other components running on top of Spark.
Spark In MapReduce : For the Hadoop users that are not running YARN yet, another option, in addition to the standalone deployment, is to use SIMR to launch Spark jobs inside MapReduce. With SIMR, users can start experimenting with Spark and use its shell within a couple of minutes after downloading it! This tremendously lowers the barrier of deployment, and lets virtually everyone play with Spark.

Should oozie be installed on all the hadoop nodes inside a single hadoop cluster?

I am running oozie over hadoop 1.0.3. I wanted to find out whether oozie has to be installed over all the hadoop nodes inside a single cluster ? Is it sufficient to install it on the master node (hadoop) only ? I searched through the oozie documentation, but could not find the answer to my question.
Thankyou,
Mohsin.
Oozie need not be installed on all the nodes in a cluster. It can be installed on a dedicated machine or along with any other framework. Check this guide for a quick installation of Oozie.
Note that Oozie has got a client and a server component. The server component has a Scheduler and also a WorkFlow engine. And the WorkFlow engine used hPDL (Hadoop Process Definition Language) for defining the WorkFlow.

Resources