Browsing Hbase data in Hue through Phoenix - jdbc

I am using CDH 5.4.4 and installed Phoenix parcel to be able to run SQL on hbase tables. Has anyone tried to browse that data using Hue?
I know since we can connect using JDBC connection to Phoenix, there must be a way for Hue to connect to it too.

The current status is that we would need to add HUE-2745 and then it would show up in DBQuery / Notebook
The latest https://phoenix.apache.org/server.html is brand new and JDBC only.
If there was an HiveServer2 Thrift API or ODBC for Phoenix it would work almost out of the box in the SQL or DB Query apps. Hue could work with JDBC but there will be a JDBC connector that is GPL (so to install separately).
The Hue jira for integrating Phoenix is https://issues.cloudera.org/browse/HUE-2121.

Related

How to configure HUE to be connected to remote Hive server?

I'm trying to use HUE Beeswax to connect my company's Hive database. Firstly, is it possible to use HUE installed on my mac to be connected with remote Hive server? If it does, how am I supposed to find the address for the Hive server which is running on our private server? Only thing I can do is to type 'hive' and put some sql queries in hive shell. I already installed HUE but can't figure out how to connect it to the remote Hive server. Any tips would be much appreciated.
If all you want is a desktop connection to Hive, you only need a JDBC client, not a full web app like Hue.
In any case, Hive CLI is deprecated. Beeline is preferred. To use Beeline and Hue, you need a HiveServer2 running.
To find the address of the HiveServer2, if you have it, you need to find your hive-site.xml file on the Hadoop cluster, and export it. Other ways to get this information are available in Ambari or Cloudera Manager (but if you're using a Cloudera CDH cluster, you already have Hue). The Thrift interface is what you want. Default port is 10000
When you setup the Hue, you will need to find the hue.ini file, in which, edit the section that starts with [beeswax] and fill in the necessary values. Personally, I find that section fairly straightforward
You can read the Hue github to find the requirements for running it on a Mac

Need to load Hana table through Spark, with no Spark Vora integration as such

I have a requirement where I have to load data from Hadoop to SAP Hana. I have already worked with MySql, DB2 and few other RDBMS with Spark and loaded using HSBC Spark Data frame API in version 1.5.0 and above also with Cassandra and Hive but not Hana.is it possible to do so without any modifications from the Hana side as can't touch Hana installation in any way.
You could use Sqoop, if you prefer to stay on Hadoop side.
SAP BusinessObjects Data Services with Hive Adapter also works fine.

Set up IBM Open Platform with an external Oracle Database

I'm a little confused when I try to install a single node IBM Open Platform cluster using an Oracle database as RDBMS.
Firstly, I understand that the Hadoop part of the IBM Big Insights is not a modified version of the corresponding Apache version (as HortonWorks do) so, when Ambari (from the IBM repo) offers me to use an external Oracle database, I suppose it should work. I may be wrong, and I can't find any oracle reference in the crappy IBM installation guide to set it up correctly (only that it should work with Oracle 11g R2)
So, as I do with an equivalent HortonWorks distribution (but using the binaries from IBM), I set up my ambari-server with all the oracle parameters (--jdbc-db=oracle --jdbc-driver=path/to/ojdbc6.jar, I'm using a Oracle 11g XE on Centos 6.5, supposed to be supported by IOP) and I specified all the stuff I had to specify to use Ambari with Oracle (Service Name, Host, Port, ...)
I created the ambari user, loaded the corresponding Oracle DDL (packaged with Ambari) and created my Hive & Oozie users, as specified in the... Hortonworks installation guide.
Well, Ambari seems to work well with Oracle, I can set up my cluster until the last step :
If I configure Hive and/or oozie to work with oracle (validating the oracle connection is OK from the service configuration tab), the "review" step (step 8) doesn't show anything (or sometimes the IOP repos, it seems to be arbitrary). Trying to deploy starts the tasks preparation and implies a blocking states of the installation: I can't do anything else than dropping the database and reload the entire DDL to try again (or I'll obtain lots of unexpected NullPointerException)
If I configure Hive AND Oozie to work with an embedded MySQL (the default choice), keeping Ambari against Oracle, everything works fine.
Am I doing something wrong?? Or is there any limitation to configure (IBM Open Platform) Hive and Oozie to use Oracle 11 ? (when it works with the HortonWorks - same apache version - and Cloudera Distribution)
Of course, log files don't tell me anything...
UPDATE:
I tried to install IOP 4.1, firstly using MySQL as my Ambari, Hive and Oozie database, everything was fine.
Next I tried to install IOP 4.1 with Oracle 11 XE as external database (I configured oracle, created ambari, hive and oozie oracle users and loaded the Ambari Oracle schema given with IOP 4.1, and I configure the same cluster as the first time, specifying the Oracle particularities for Hive, Oozie (and Sqoop (Oracle driver)). Before deploying the services to all the nodes, Ambari is supposed to resume what it is going to install, but it doesn't: sometimes it doesn't show anything, sometimes it shows only the IOP repos urls. Next, trying to deploy, it starts the preparation tasks but never ends. and that's it. No message, no log, nothing, it just get stucked.
As the desired components of IOP 4.1 are in the same version in HDP 2.3 (Ambari 2.1, Hive 1.2.1, oozie 4.2.0, hadoop 2.7.1, pig 0.15.0, sqoop 1.4.6 and zookeeper 3.4.6), I tried to configure exactly the same cluster with HDP 2.3, Oracle 11 XE, ... and everything worked. I noticed that HDP 2.3 forces me to use SSL, while IOP does not. HDP works with an Oracle JDK 1.8 by default while IOP actually offer to use an OpenJDK 1.8 instead. I don't know if it matters, I'll try to be sure... I'll take pictures of the Ambari screen when it blocks and copy the log traces, even if there's no error message...
If anyone got an idea, please share it!
Thanks!
Trying to operate the same installation using the Oracle JDK 1.8 everything works fine.
I don't know if there is any restriccion using the Oracle JDBC driver with OpenJDK 1.8 but using Oracle 11 XE with IOP 4.1 + Oracle JDK 1.8 works.

Connect to Spark SQL via ODBC

According to this page: https://spark.apache.org/sql/ you can connect existing BI tools to Spark SQL via ODBC or JDBC:
I don't mean Shark as this is basically EOL:
It is for this reason that we are ending development in Shark as a separate project and moving all our development resources to Spark SQL, a new component in Spark.
How would a BI tool (like Tableau) connect to shark sql via ODBC?
With the release of Spark SQL 1.1 you also have thrift JDBC driver see https://spark.apache.org/docs/latest/sql-programming-guide.html#distributed-sql-engine
Simba provides the ODBC driver that Databricks uses, however that is only for the Databricks distribution. We are launching the public version for use with Apache tomorrow (Wed, Dec 3rd) at www.simba.com. You'll be able to download and trial the driver for use with Tableau then.
As Carlos said, Stratio Meta is a module that acts as a parser, validator, planner and coordinator layer over the different persistence layers (currently, only Cassandra and Mongo, but also HDFS in the short term). This modules offers a Shell with a SQL-like language, a Java/Scala API, a REST API and ODBC (JDBC shortly). It also uses another Stratio module, Stratio Deep, which allows us to use Apache Spark in order to execute query in an efficent and fast way.
Disclaimer: I am currently employed by Stratio Big Data
Please take a look at: http://www.openstratio.org/blog/connecting-to-the-stratio-big-data-platform-using-odbc-2/
Stratio is a platform that includes a certified Spark distribution that allows you to connect Spark to any type of data repository (like Cassandra, MongoDB,...). It has an ODBC Driver so you can write SQL queries that will be translated to Spark jobs, or even faster, direct queries to Cassandra -or whichever database you want to connect to it - if possible. This way, it is pretty simple to connect Tableau into Spark and your data repository. If you need any help, we will be more than glad to assist you.
Disclaimer: I'm one of Stratio's ODBC developers
Simba will offer one: http://databricks.com/blog/2014/04/30/Databricks-selects-Simba-ODBC-driver-for-shark.html. No known official release date.
[update]
Use HIVE's ODBC driver to connect to Spark SQL as described here and here.
For Spark on Azure HDInsight, you can connect Tableau (or PowerBI) as described here https://azure.microsoft.com/en-us/documentation/articles/hdinsight-apache-spark-use-bi-tools/. The ODBC driver is here: http://www.microsoft.com/en-us/download/details.aspx?id=47713

Is it possible to connect tableau to cloudera hive in windows 7?

I downloaded and installed cloudera hive drivers provided in the link http://www.tableausoftware.com/support/drivers. But when I try to add driver in ODBC connections, it is not shown there. I read some where that cloudera hive driver will work only
with windows 2008. I am using windows 7. Kindly help me.
A little late in the day, but here are some more detailed articles from the Tableau Knowledge Base may be of interest to you or anyone else interested in this question.
Connecting to Hadoop Hive
Extra Capabilities for Hadoop Hive
Designing for Performance Using Hadoop Hive
Administering Hadoop and Hive for Tableau Connectivity
Failing that, if you are still unable to connect to Cloudera Hive and you're a registered customer, or have downloaded a trial, then you can always drop an email to support#tableausoftware.com and ask for help there. :)
Yes it is possible to connect Tableau to cloudera Hive on Windows 7.
Steps are:
1. start the thrift server for hive
nohup HIVE_PORT=10000 hive --service hiveserver &
2. install the Hive ODBC driver from https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Tableau+License+Agreement
3. open Tableau
Connect to Data -> Cloudera Hadoop Hive -> Give the server ip and port :10000 (you can change the thrift server port if you need to by changing HIVE_PORT to some other value while starting the Hive server)
The rest is straight forward.
Also make sure that the required port (10000 or which ever you chose) is open in the firewall.
Please make sure that you tried to create the ODBC connection in ODBC 32bit, since the drivers and the Tableau desktop is a 32bit application. You can run the ODBC 32bit driver panel with the odbcad32.exe command line.

Resources