How to configure HUE to be connected to remote Hive server? - hadoop

I'm trying to use HUE Beeswax to connect my company's Hive database. Firstly, is it possible to use HUE installed on my mac to be connected with remote Hive server? If it does, how am I supposed to find the address for the Hive server which is running on our private server? Only thing I can do is to type 'hive' and put some sql queries in hive shell. I already installed HUE but can't figure out how to connect it to the remote Hive server. Any tips would be much appreciated.

If all you want is a desktop connection to Hive, you only need a JDBC client, not a full web app like Hue.
In any case, Hive CLI is deprecated. Beeline is preferred. To use Beeline and Hue, you need a HiveServer2 running.
To find the address of the HiveServer2, if you have it, you need to find your hive-site.xml file on the Hadoop cluster, and export it. Other ways to get this information are available in Ambari or Cloudera Manager (but if you're using a Cloudera CDH cluster, you already have Hue). The Thrift interface is what you want. Default port is 10000
When you setup the Hue, you will need to find the hue.ini file, in which, edit the section that starts with [beeswax] and fill in the necessary values. Personally, I find that section fairly straightforward
You can read the Hue github to find the requirements for running it on a Mac

Related

Query Hive remotely using shell

Let's imagine I have access to an Hive datawarehouse, I can query it using some webservice. The problem is that I cannot automate the query using this service, so I would like to be able to query Hive from an external script (that I would be able to automate).
For now, I've only seen people running Hive on their local machine and querying it, I was wondering if it was possible to do it remotely ? If yes, how ?
Thanks a lot !
As far as I understood, you are asking if there are ways to connect to hive from a remote machine?
You could install hive client (beeline) on any remote machine and connect to hive via jdbc.
Take a look here:
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
An easy way to do this, is to deploy the client configuration of hadoop/yarn on the remote machine. If the remote cluster is secured with firewalls and kerberos, you will need access to those first. After that it's just a matter of starting up a hive shell or committing a job submit to Yarn.
When you use Cloudera, you might be able to add the host to the cluster and install a "gateway" role for yarn and hive on the target machine. This is very straight-forward and requires just a few minutes of work.
Alternatively using the JDBC connector should also work, as stated in Facha's answer.

Apache Hive Installation on pseudo distributed or multi node cluster environment

I have installed hadoop on multi node environment in my PC as below
1: 4 virtual box instances loaded with ubuntu(14.04)
2: 1-master node , 2-slave node and remaining vm instance works as client
Note: All 4 VM'S are running in my PC itself
I was able to complete apace-2.6 hadoop setup successfully on the above mentioned setup .Now I want to install hive in order to do some data summarization, query, and analysis .
But I am not sure how I have to proceed further. I have few queries mentioned below :
Q1: Do I need to install/setup Apache Hive(0.14) on all nodes(master/name-node and slave/data-node)? or is it only on master node?
Q2: what is the mode should be used to deal with the meta-store is it local mode or remote mode ?
Q3: In case if I want to use mysql for hive meta-store,should I install it on master/name node itself or do I need to use separate client machine for this?
please can some one also share me if there are any steps to be followed to configure metastore? in multi node/pseudo distributed environment.
BR,
San
You need to install the required Hive services (HiveServer2, Metastore, WebHCat) only once. In your lab scenario, you would probably put them on the master. The client can then run Beeline (the HiveServer2 client.)
If you configure the Metastore as Local, Hive will use a local Derby database. Again, for your lab setup, this is probably just what you need/want.
In a production scenario, you would
set up a dedicated server for supporting services that should not fight for resources with the namenode process(es)
and use a dedicated database server for your Metastore database, which will be remote.

Connect tableau with Elastic search via Spark SQL

I found a post that discusses about connecting Tableau to Elastic Search via Hive SQL. I was wondering if there is a way to connect to Elastic Search via Spark SQL as I am not much familiar with hive.
Thanks.
#busybug91,
The right driver is here please try with this one. Could be solve your issue.
#NicholasY It got it resolved after a couple of trials. Two steps that I took:-
I wasn't using the right driver for connection. I was using datastax enterprise driver. However, they have a driver for spark sql as well. I used windows 64bit version of driver. Using MapR Hadoop Hive and Hortonworks Hadoop Hive drivers didn't work as I've Apache hive.
When I used right driver (from DataStax) I realized that my hive metastore and spark-thrift-server running on same port. I changed spark-thrift-server's port to 10001 and a successful connection was established.
A new problem: I've created external table in hive. I am able to query the data as well. I start hive-metastore as a service. However, as mentioned on this link I am not able to see my tables in hive in Spark SQL. My connection of Tableau with Spark Sql is of no use unless I see tables from hive metastore!! When I do show tables; in spark sql (via spark-sql shell and hive metastore running as a service as same time), it runs a job which gives a completion time also but now table names. I monitored it via localhost:4040 I see that input and output size are 0.0. I believe I am not able to get tables from hive in spark sql that is why I don't see any table after connection is established from Tableau to spark sql.
EDIT
I changed metastore from derby to mysql for both hive and spark sql.
I'm trying to do that, so maybe i can help you to warn up something.
First, compile a Spark SQL version with Hive and thrift Server (ver 0.13):
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
You need to have a hive-site.xml properly configurered to work with Hive and to copy it to spark/conf folder.
Then, you have to set the $CLASSPATH with the elasticsearch-hadoop jar path.
Be careful ! Spark SQL 1.2.0 is not working with elasticsearch-hadoop-2.0.x. You have to use a elasticsearch-hadoop-2.1.0-Beta4 or BUILD-SNAPSHOT available here.
To finish you have to run thriftserver with something like that:
./start-thriftserver.sh --master spark://master:7077 --driver-class-path $CLASSPATH --jars /root/spark-sql/spark-1.2.0/lib/elasticsearch-hadoop-2.1.0.Beta4.jar --hiveconf hive.server2.thrift.bind.host 0.0.0.0 --hiveconf hive.server2.thrift.port 10000
It works for me but only on small docType ( 5000 rows ) , the data-colocation seems not working. I looking for a solution to move elasticsearch-hadoop.jar on each Spark workers as ryrobes did for Hadoop.
If you find a way to locate access to elasticsearch, let me know ;)
HTH,

Cloudera beeswax server and hive server

I have a fundamental question regarding the two servers mentioned in the context of cloudera cdh4 distribution
Are those two interchangeable/replaceable as in could you run beeswax in place of hive server?
I'm trying to use a thrift client to connect and in my set up only the beeswax is running and not the hive server. In such a case can I connect to the beeswax server?
Hive Server is the default process and Beeswax is a newer process designed to better support concurrency and provide authentication using Kerberos. You should run one or the other.
And yes, you should definitely be able to connect to beeswax using Thrift. You can find clients for Beeswax and Hive server here.
what is the difference between hive-server2 and beeswax? They are both designed to better support concurrency and security.

Is it possible to connect tableau to cloudera hive in windows 7?

I downloaded and installed cloudera hive drivers provided in the link http://www.tableausoftware.com/support/drivers. But when I try to add driver in ODBC connections, it is not shown there. I read some where that cloudera hive driver will work only
with windows 2008. I am using windows 7. Kindly help me.
A little late in the day, but here are some more detailed articles from the Tableau Knowledge Base may be of interest to you or anyone else interested in this question.
Connecting to Hadoop Hive
Extra Capabilities for Hadoop Hive
Designing for Performance Using Hadoop Hive
Administering Hadoop and Hive for Tableau Connectivity
Failing that, if you are still unable to connect to Cloudera Hive and you're a registered customer, or have downloaded a trial, then you can always drop an email to support#tableausoftware.com and ask for help there. :)
Yes it is possible to connect Tableau to cloudera Hive on Windows 7.
Steps are:
1. start the thrift server for hive
nohup HIVE_PORT=10000 hive --service hiveserver &
2. install the Hive ODBC driver from https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Tableau+License+Agreement
3. open Tableau
Connect to Data -> Cloudera Hadoop Hive -> Give the server ip and port :10000 (you can change the thrift server port if you need to by changing HIVE_PORT to some other value while starting the Hive server)
The rest is straight forward.
Also make sure that the required port (10000 or which ever you chose) is open in the firewall.
Please make sure that you tried to create the ODBC connection in ODBC 32bit, since the drivers and the Tableau desktop is a 32bit application. You can run the ODBC 32bit driver panel with the odbcad32.exe command line.

Resources