Q1 : What is Server2 in Hive?
Q2 : What is the use of jdbc or odbc in Server2? For What purpose server2 is used with jdbc or odbc?
Q3 : If i want to connect with Hive server2 to jdbc or odbc, how I can connect? Can I connect in my cloudera which is single node? Guide me how to connect with it?
Q4 : How to connect with Beeline in Cloudera. The commands of Beeline are same or there is any difference. How to connect Beeline with jdbc and odbc?
Please help me regarding these questions. I searched on internet but unable to understand it.. Thanks in advance
Please find answers below:
A1. HiveServer2 is simply the version 2 of the Hive Server. The enhanced Hive server is designed for multi-client concurrency and improved authentication that encourages clients to connect through JDBC and ODBC rather than thrift protocol directly
A2. JDBC/ODBC is the standard recommended way to interact with SQL engines through programming languages. Apart from interacting with Hive using command line i.e. beeline, clients can interact programmatically or external applications like Tableau / Qlik etc which needs the corresponding JDBC/ODBC drivers. The process should be the same whether its a single node or distributed cluster.
A3. Please refer Cloudera documentation on how to setup and execute Hive commands using JDBC/ODBC. Check the below links
http://www.cloudera.com/documentation/other/connectors/hive-jdbc/latest/Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf
A4. Check the link for complete example - http://hadooptutorial.info/hiveserver2-beeline-introduction/
Hope that helps!!
Related
I'm trying to use HUE Beeswax to connect my company's Hive database. Firstly, is it possible to use HUE installed on my mac to be connected with remote Hive server? If it does, how am I supposed to find the address for the Hive server which is running on our private server? Only thing I can do is to type 'hive' and put some sql queries in hive shell. I already installed HUE but can't figure out how to connect it to the remote Hive server. Any tips would be much appreciated.
If all you want is a desktop connection to Hive, you only need a JDBC client, not a full web app like Hue.
In any case, Hive CLI is deprecated. Beeline is preferred. To use Beeline and Hue, you need a HiveServer2 running.
To find the address of the HiveServer2, if you have it, you need to find your hive-site.xml file on the Hadoop cluster, and export it. Other ways to get this information are available in Ambari or Cloudera Manager (but if you're using a Cloudera CDH cluster, you already have Hue). The Thrift interface is what you want. Default port is 10000
When you setup the Hue, you will need to find the hue.ini file, in which, edit the section that starts with [beeswax] and fill in the necessary values. Personally, I find that section fairly straightforward
You can read the Hue github to find the requirements for running it on a Mac
Let's imagine I have access to an Hive datawarehouse, I can query it using some webservice. The problem is that I cannot automate the query using this service, so I would like to be able to query Hive from an external script (that I would be able to automate).
For now, I've only seen people running Hive on their local machine and querying it, I was wondering if it was possible to do it remotely ? If yes, how ?
Thanks a lot !
As far as I understood, you are asking if there are ways to connect to hive from a remote machine?
You could install hive client (beeline) on any remote machine and connect to hive via jdbc.
Take a look here:
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
An easy way to do this, is to deploy the client configuration of hadoop/yarn on the remote machine. If the remote cluster is secured with firewalls and kerberos, you will need access to those first. After that it's just a matter of starting up a hive shell or committing a job submit to Yarn.
When you use Cloudera, you might be able to add the host to the cluster and install a "gateway" role for yarn and hive on the target machine. This is very straight-forward and requires just a few minutes of work.
Alternatively using the JDBC connector should also work, as stated in Facha's answer.
I am going through Apache Hive these days and the following thing is confusing me quite a bit -
There is a Hive Web Interface (hive --service hwi), that listens on a port (default 9999) and allow the client to Submit a query and come back later facility, Authorization equipped etc.
There is also a HiveServer (hive --service HiveServer), that runs a server and allows remote clients to connect and submit Hive queries and is also Authorization protected etc.
How are they different ? (OR are they not) ? If they are different, but offers the same kind of features, what is different ?
There is also a HiveServer2 and a Thrift server, which not sure but I think an improvement over HiveServer ?
Can someone talk about them and clarify, whats the uniqueness in them and bigger problem they solve ?
Regards,
(*Vipul)() ;
HWI
Hive's HWI (HiveWebInterface) is an alternative to using Hive command line interface. It provides the features such as:
Schema browsing
Detached query execution
Manage sessions
No local installation
HiveServer
HiveServer on the other hand allows remote clients to submit requests to Hive using Thrift's various programming language bindings. As HiveServer uses Thrift, it is sometimes called as ThriftServer.
HiveServer v1 cannot handle concurrent requests from more than one client, this limitation is addressed in HiveServer v2, which allows multiple concurrent connections to clients. HiveServer2 also provides:
authentication using Kerberos & LDAP
SSL encryption
PAM
HiveServer2 provides various client interfaces like:
Beeline command line shell
JDBC
Python & Ruby clients
HiveServer2 JDBC driver can be used to connect to BI tools like Tableau, Talend, etc. to perform ETL.
According to this page: https://spark.apache.org/sql/ you can connect existing BI tools to Spark SQL via ODBC or JDBC:
I don't mean Shark as this is basically EOL:
It is for this reason that we are ending development in Shark as a separate project and moving all our development resources to Spark SQL, a new component in Spark.
How would a BI tool (like Tableau) connect to shark sql via ODBC?
With the release of Spark SQL 1.1 you also have thrift JDBC driver see https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
Simba provides the ODBC driver that Databricks uses, however that is only for the Databricks distribution. We are launching the public version for use with Apache tomorrow (Wed, Dec 3rd) at www.simba.com. You'll be able to download and trial the driver for use with Tableau then.
As Carlos said, Stratio Meta is a module that acts as a parser, validator, planner and coordinator layer over the different persistence layers (currently, only Cassandra and Mongo, but also HDFS in the short term). This modules offers a Shell with a SQL-like language, a Java/Scala API, a REST API and ODBC (JDBC shortly). It also uses another Stratio module, Stratio Deep, which allows us to use Apache Spark in order to execute query in an efficent and fast way.
Disclaimer: I am currently employed by Stratio Big Data
Please take a look at: http://www.openstratio.org/blog/connecting-to-the-stratio-big-data-platform-using-odbc-2/
Stratio is a platform that includes a certified Spark distribution that allows you to connect Spark to any type of data repository (like Cassandra, MongoDB,...). It has an ODBC Driver so you can write SQL queries that will be translated to Spark jobs, or even faster, direct queries to Cassandra -or whichever database you want to connect to it - if possible. This way, it is pretty simple to connect Tableau into Spark and your data repository. If you need any help, we will be more than glad to assist you.
Disclaimer: I'm one of Stratio's ODBC developers
Simba will offer one: http://databricks.com/blog/2014/04/30/Databricks-selects-Simba-ODBC-driver-for-shark.html. No known official release date.
[update]
Use HIVE's ODBC driver to connect to Spark SQL as described here and here.
For Spark on Azure HDInsight, you can connect Tableau (or PowerBI) as described here https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-use-bi-tools/. The ODBC driver is here: http://www.microsoft.com/en-us/download/details.aspx?id=47713
I downloaded and installed cloudera hive drivers provided in the link http://www.tableausoftware.com/support/drivers. But when I try to add driver in ODBC connections, it is not shown there. I read some where that cloudera hive driver will work only
with windows 2008. I am using windows 7. Kindly help me.
A little late in the day, but here are some more detailed articles from the Tableau Knowledge Base may be of interest to you or anyone else interested in this question.
Connecting to Hadoop Hive
Extra Capabilities for Hadoop Hive
Designing for Performance Using Hadoop Hive
Administering Hadoop and Hive for Tableau Connectivity
Failing that, if you are still unable to connect to Cloudera Hive and you're a registered customer, or have downloaded a trial, then you can always drop an email to support#tableausoftware.com and ask for help there. :)
Yes it is possible to connect Tableau to cloudera Hive on Windows 7.
Steps are:
1. start the thrift server for hive
nohup HIVE_PORT=10000 hive --service hiveserver &
2. install the Hive ODBC driver from https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Tableau+License+Agreement
3. open Tableau
Connect to Data -> Cloudera Hadoop Hive -> Give the server ip and port :10000 (you can change the thrift server port if you need to by changing HIVE_PORT to some other value while starting the Hive server)
The rest is straight forward.
Also make sure that the required port (10000 or which ever you chose) is open in the firewall.
Please make sure that you tried to create the ODBC connection in ODBC 32bit, since the drivers and the Tableau desktop is a 32bit application. You can run the ODBC 32bit driver panel with the odbcad32.exe command line.