I am aware of installing Hue for HDInsight HDP cluster by deploying it on an edge node of the cluter (using a script action, link), it works fine but asks for the cluster credentials first and then directs me to the Hue login page. Is there a way to get rid of those credentials?
Else, is it possible to deploy Hue on a remote system and then point it to my HDInsight HDP cluster? If so how do I go about?
And which of the above two approaches is better?
Based on my understanding & experience, to answer your questions as below.
There is not any way to get rid of those credentials, due to the credential is to authenticate for Resource Management Template deployment, not only for cluster.
It's not possible to deploy Hue on a remote system, because of "Hue consists of a web service that runs on a special node in your cluster." as the Hue offical manual said from here.
Hope it helps.
Related
I just installed Hortonwroks Sandbox via virtualbox. And when i started Ambari every services was red like you can see in this screenshot . Have i missed something? i'm a beginner in hadoop
Actually, when we start HDP Sandbox all Services services go into the starting stage except Strome, Atlas, Hbase (This can be checked by Gear Icon on the top right side from there you can check the reason behind of failed Services).
Try to Manually Start services in the following manner
Zookeeper
HDFS
YARN
MapReduce
Hive
Sorry for my rookie question. I was just wondering if I could ever install a HUE or Spark on my personal PC at home and ssh (through openvpn) the Cloudera of my company? It's just I have no computer science backgroud in the past and I really need to retrieve some data and start my data analysis work. And I'm not very comfortable with terminal view of HIVE or IMPALA which is already installed on the server.
Thanks in advance! Cheers.
It is not possible because Cloudera will warn that spark or hue nodes will lost ping for your network latency.
What you want is possible if your PC is added as a node on the same hadoop cluster.
Post which you may install HUE as a service on it.
As stated in the other answer, network latency could create issues.
Alternatively if your company is using Cloudera Distribution, it should already have Hue installed which you may use over the network using putty(with X11 forwarding) or any similar tool.
I am new to IBM Bluemix platform and exploring its BigInsights service. I can see pre configured components such as Pig Hive Hbase and others. But I want to know How can I install services like Drill or say Hue which is not configured by default. Also ssh to cluster nodes allows restricted access with no sudo rights in case one need to run yum commands.Does bluemix allows root access as I cannot see one. Thanks In advance.
As far as I know, it is not possible.
But you can use http://www.softlayer.com/ to build your own IOP (IBM Open Platform) Cluster in the cloud.
If you are interested in IBM's value-adds and you just want to try out:
https://www.youtube.com/watch?v=4p7LDeu_qQQ it is a nice tutorial to set up your own cluster via Docker.
This tutorial should be still valid for Hue:
https://developer.ibm.com/hadoop/2015/06/02/deploying-hue-on-ibm-biginsights/
Installing Drill doesn't look complicated:
https://drill.apache.org/docs/installing-drill-in-distributed-mode/
In conclusion: You need to move away from Bluemix, if you want to have a more customised BigInsights. But there are options: Softlayer, AWS, .. or just on your local computer (if you got sufficient resources, since some components like Hbase need a minimum amount of nodes)
We have deployed Spark on Mesos and we are having problems when we are trying to read from HDFS.
Out of the box we tried using Kerberos, which works fine in local mode, but fails when running on Mesos (even though we made sure that each machine had a valid token).
My question is, what alternatives do we have?
According to this document the only two options are Kerberos and shared secret, from which Kerberos only works on YARN.
I've not found any article how a shared secret could be set up on HDFS.
I am new to cloudera, I installed cloudera in my system successfully I have two doubts,
Consider a machine with some nodes already using hadoop with some data, Can we install Cloudera to use the existing Hadoop without made any changes or modifaction on data stored existing hadooop.
I installed Cloudera in my machine, I have another three machines to add those as clusters, I want to know, Am i want install cloudera in those three machines before add those machines as clusters ?, or Can we add a node as clusters without installing cloudera on that purticular nodes?.
Thanks in advance can anyone, please give some information about the above questions.
Answer to questions -
1. If you want to migrate to CDH from existing Apache Distribution, you can follow this link
Excerpt:
Overview
The migration process does require a moderate understanding of Linux
system administration. You should make a plan before you start. You
will be restarting some critical services such as the name node and
job tracker, so some downtime is necessary. Given the value of the
data on your cluster, you’ll also want to be careful to take recent
back ups of any mission-critical data sets as well as the name node
meta-data.
Backing up your data is most important if you’re upgrading from a
version of Hadoop based on an Apache Software Foundation release
earlier than 0.20.
2.CDH binary needs be installed and configured in all the nodes to have a CDH based cluster up and running.
From the Cloudera Manual
You can migrate the data from a CDH3 (or any Apache Hadoop) cluster to a CDH4 cluster by
using a tool that copies out data in parallel, such as the DistCp tool
offered in CDH4.
Other sources
Regarding your second question,
Again from the manual page
Important:
Before proceeding, you need to decide:
As a general rule:
The NameNode and JobTracker run on the the same "master" host unless
the cluster is large (more than a few tens of nodes), and the master
host (or hosts) should not
run the Secondary NameNode (if used), DataNode or TaskTracker
services. In a large cluster, it is especially important that the
Secondary NameNode (if used) runs on a separate machine from the
NameNode. Each node in the cluster except the master host(s) should
run the DataNode and TaskTracker services.
Additionally, if you use Cloudera Manager it will automatically do all the setup necessary i.e install the necessary selected components on the nodes in the cluster.
Off-topic: I had a bad habit of not referrring the manual properly. Have a clear look at it, it answers all our questions
Answer to your second question,
you can add directly, with installation few pre requisites like openssh-clients and firewalls and java.
these machines( existing node, new three nodes) should accept same username and password (or) you should set passwordless ssh to these hosts..
you should connect to the internet while adding the nodes.
I hope it will help you:)