What is Hue all about? - hadoop

I am new to Big Data. I want to know about Hue. All i know about Hue is that it is a web interface to manage Hadoop ecosystem. Please let me know if i can install in on my pc (Ubuntu Precise). I am running Apache Hadoop 1.2.1 in pseudo distributed mode with PIG and HIVE
Thanks in Advance

Hue is a Web interface for analyzing data with Apache Hadoop. You can install it in any pc with any hadoop version.
Hue is a suite of applications that provide web-based access to CDH components and a platform for building custom applications.
The following figure illustrates how Hue works. Hue Server is a "container" web application that sits in between your CDH installation and the browser. It hosts the Hue applications and communicates with various servers that interface with CDH components.
here you have all the explanations about hue and downloads:
http://gethue.com/

Related

Hadoop distribution

I was using IBM big insights via VNC software (remote access) provided by the university I study but I can't access Internet through that desktop. To use some data samples available in internet, I decided to install Hadoop in my laptop (single cluster), but I found that there are many distributions, So What's the best free Hadoop distribution for training as a beginner ?
1) Amazon Elastic MapReduce
2) Cloudera CDH Hadoop Distribution
3) Hortonworks Data Platform (HDP)
4) MapR Hadoop Distribution
5) IBM Open Platform
6) Microsoft Azure's HDInsight -Cloud based Hadoop Distrbution
7) Pivotal Big Data Suite
8) Datameer Professional
9) Datastax Enterprise Analytics
10) Dell- Cloudera Apache Hadoop Solution.
CDH and Hortonworks are the easiest to get a single node cluster up and running, and are also very widely used so you can find a lot of troubleshooting resources.
If you just want to write application code/run arbitrary MapReduce jobs rather than learn the Hadoop systems architecture, then Amazon EMR is more suitable.

Apache ambari installation

I'm trying to install ambari server + agents.
I have a doubt regarding ambari.
I tried to install ambari.
It always gets link with hortonwork
My doubt is that I have hadoop cluster of my own in Ubunu 16.0.Will ambari only work with HDP or is it possible to also make it work with custom built clusters?
Or if possible please share me detailed descriptive documentation
It's not clear where you downloaded Ambari from, but it sounds like you used the Hortonworks version of it. Not directly from https://ambari.apache.org
Ambari works with the concept of stacks. Each stack has a set of services and components. HDP is such a stack, but there are others, or you can even define your own, so yes, you can manage your own Hadoop installation components, but that really would be not much different from what Hortonworks already provides.
Besides, the HDP services and components have been tested to work together more throughly than off the shelf Hadoop installation.
If you don't want HDP components, there is also the Apache Bigtop project that provides installation packs for many Hadoop related services
Ambari expects Java and Hadoop to be installed in a certain way. I'm not sure how easy it is to setup for an existing Hadoop install.

Configuring Hue to point to a HDInsight HDP cluster

I have downloaded the source code of Hue and have built it locally on my Ubuntu (16.04) system. I want to configure it to point to my HDInsight HDP cluster head-node so that I can access my hive databases. I am aware of the script action, but want the Hue on remote system and point it to the cluster. How can I go about?

How do I install components such as Apache Drill and Apache Hue in IBM Bluemix BigInsights Apache Hadoop

I am new to IBM Bluemix platform and exploring its BigInsights service. I can see pre configured components such as Pig Hive Hbase and others. But I want to know How can I install services like Drill or say Hue which is not configured by default. Also ssh to cluster nodes allows restricted access with no sudo rights in case one need to run yum commands.Does bluemix allows root access as I cannot see one. Thanks In advance.
As far as I know, it is not possible.
But you can use http://www.softlayer.com/ to build your own IOP (IBM Open Platform) Cluster in the cloud.
If you are interested in IBM's value-adds and you just want to try out:
https://www.youtube.com/watch?v=4p7LDeu_qQQ it is a nice tutorial to set up your own cluster via Docker.
This tutorial should be still valid for Hue:
https://developer.ibm.com/hadoop/2015/06/02/deploying-hue-on-ibm-biginsights/
Installing Drill doesn't look complicated:
https://drill.apache.org/docs/installing-drill-in-distributed-mode/
In conclusion: You need to move away from Bluemix, if you want to have a more customised BigInsights. But there are options: Softlayer, AWS, .. or just on your local computer (if you got sufficient resources, since some components like Hbase need a minimum amount of nodes)

How to install Apache Spark on HortonWorks HDP 2.2 (built using Ambari)

I successfully built a 5 node cluster of HortonWorks HDP 2.2 using Ambari.
However I don't see Apache Spark in the installed services list.
I did some research and found that Ambari does not install certain components like hue etc. ( Spark was not in that list, but I guess its not installed).
How do I do a manual install of Apache spark on my 5 node HDP 2.2?
Or should I delete my cluster and perform a fresh install without using Ambari?
Hortonworks support for Spark is arriving but not fully complete (details and blog).
Instructions for how to integrate Spark with HDP can be found here.
You could build your own Ambari Stack for Spark. I recently did just that, but I cannot share that code :(
What I can do is share a tutorial I did on how to do any stack for Ambari, including Spark. There are many interesting issues with Spark that need to be addressed and are not covered through the tutorial. Anyways hope it helps. http://bit.ly/1HDBgS6
There is also a guide from the Ambari people here: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133.
1) Ambari 1.7x does not install Accumulo, Hue, Ranger, or Solr services for the HDP 2.2 Stack.
For Installing Accumulo, Hue, Knox, Ranger, and Solr services, install
HDP Manually.
2) Apache Spark 1.2.0 on YARN with HDP 2.2 : here .
3)
Spark and Hadoop: Working Together :
Standalone deployment: With the standalone deployment one can statically allocate resources on all or a subset of machines in a Hadoop cluster and run Spark side by side with Hadoop MR. The user can then run arbitrary Spark jobs on her HDFS data. Its simplicity makes this the deployment of choice for many Hadoop 1.x users.
Hadoop Yarn deployment: Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN without any pre-installation or administrative access required. This allows users to easily integrate Spark in their Hadoop stack and take advantage of the full power of Spark, as well as of other components running on top of Spark.
Spark In MapReduce : For the Hadoop users that are not running YARN yet, another option, in addition to the standalone deployment, is to use SIMR to launch Spark jobs inside MapReduce. With SIMR, users can start experimenting with Spark and use its shell within a couple of minutes after downloading it! This tremendously lowers the barrier of deployment, and lets virtually everyone play with Spark.

Resources