Set up IBM Open Platform with an external Oracle Database - oracle

I'm a little confused when I try to install a single node IBM Open Platform cluster using an Oracle database as RDBMS.
Firstly, I understand that the Hadoop part of the IBM Big Insights is not a modified version of the corresponding Apache version (as HortonWorks do) so, when Ambari (from the IBM repo) offers me to use an external Oracle database, I suppose it should work. I may be wrong, and I can't find any oracle reference in the crappy IBM installation guide to set it up correctly (only that it should work with Oracle 11g R2)
So, as I do with an equivalent HortonWorks distribution (but using the binaries from IBM), I set up my ambari-server with all the oracle parameters (--jdbc-db=oracle --jdbc-driver=path/to/ojdbc6.jar, I'm using a Oracle 11g XE on Centos 6.5, supposed to be supported by IOP) and I specified all the stuff I had to specify to use Ambari with Oracle (Service Name, Host, Port, ...)
I created the ambari user, loaded the corresponding Oracle DDL (packaged with Ambari) and created my Hive & Oozie users, as specified in the... Hortonworks installation guide.
Well, Ambari seems to work well with Oracle, I can set up my cluster until the last step :
If I configure Hive and/or oozie to work with oracle (validating the oracle connection is OK from the service configuration tab), the "review" step (step 8) doesn't show anything (or sometimes the IOP repos, it seems to be arbitrary). Trying to deploy starts the tasks preparation and implies a blocking states of the installation: I can't do anything else than dropping the database and reload the entire DDL to try again (or I'll obtain lots of unexpected NullPointerException)
If I configure Hive AND Oozie to work with an embedded MySQL (the default choice), keeping Ambari against Oracle, everything works fine.
Am I doing something wrong?? Or is there any limitation to configure (IBM Open Platform) Hive and Oozie to use Oracle 11 ? (when it works with the HortonWorks - same apache version - and Cloudera Distribution)
Of course, log files don't tell me anything...
UPDATE:
I tried to install IOP 4.1, firstly using MySQL as my Ambari, Hive and Oozie database, everything was fine.
Next I tried to install IOP 4.1 with Oracle 11 XE as external database (I configured oracle, created ambari, hive and oozie oracle users and loaded the Ambari Oracle schema given with IOP 4.1, and I configure the same cluster as the first time, specifying the Oracle particularities for Hive, Oozie (and Sqoop (Oracle driver)). Before deploying the services to all the nodes, Ambari is supposed to resume what it is going to install, but it doesn't: sometimes it doesn't show anything, sometimes it shows only the IOP repos urls. Next, trying to deploy, it starts the preparation tasks but never ends. and that's it. No message, no log, nothing, it just get stucked.
As the desired components of IOP 4.1 are in the same version in HDP 2.3 (Ambari 2.1, Hive 1.2.1, oozie 4.2.0, hadoop 2.7.1, pig 0.15.0, sqoop 1.4.6 and zookeeper 3.4.6), I tried to configure exactly the same cluster with HDP 2.3, Oracle 11 XE, ... and everything worked. I noticed that HDP 2.3 forces me to use SSL, while IOP does not. HDP works with an Oracle JDK 1.8 by default while IOP actually offer to use an OpenJDK 1.8 instead. I don't know if it matters, I'll try to be sure... I'll take pictures of the Ambari screen when it blocks and copy the log traces, even if there's no error message...
If anyone got an idea, please share it!
Thanks!

Trying to operate the same installation using the Oracle JDK 1.8 everything works fine.
I don't know if there is any restriccion using the Oracle JDBC driver with OpenJDK 1.8 but using Oracle 11 XE with IOP 4.1 + Oracle JDK 1.8 works.

Related

Database versions for Oozie

I would like to change my Oozie installation from a MySQL db to an Oracle db.
My cluster is running CDH 5.4.7 with Oozie 4.1. The Oracle db that I have access to is version 12c.
In the Cloudera documentation it states that Oracle db 12c is only supported by Cloudera Manager and CDH 5.6 and newer.
My question is therefore: is there any reason why my Oozie installation should not be able to use this database, even through Cloudera components do not support it? In the Oozie documentation it does not state anything version related, as far as I have found.
I am lacking a non-production system to test this on, but looking into setting one up currently.
Any answers, including speculation, are appreciated.
If any information is missing, I will gladly append.
Thanks
Oozie inside CDH5.4.7 is using a quite old OpenJPA version, 2.2.2.
OpenJPA 2.2.2 does not support Oracle 12c.
However CDH5.8.0 still using OpenJPA 2.2.2, so my guess it that is will probably work but was never tested. Make sure to create a backup of your DB before the migration. Also, you might try the DB migration tool developed in OOZIE-2632

How to configure HUE to be connected to remote Hive server?

I'm trying to use HUE Beeswax to connect my company's Hive database. Firstly, is it possible to use HUE installed on my mac to be connected with remote Hive server? If it does, how am I supposed to find the address for the Hive server which is running on our private server? Only thing I can do is to type 'hive' and put some sql queries in hive shell. I already installed HUE but can't figure out how to connect it to the remote Hive server. Any tips would be much appreciated.
If all you want is a desktop connection to Hive, you only need a JDBC client, not a full web app like Hue.
In any case, Hive CLI is deprecated. Beeline is preferred. To use Beeline and Hue, you need a HiveServer2 running.
To find the address of the HiveServer2, if you have it, you need to find your hive-site.xml file on the Hadoop cluster, and export it. Other ways to get this information are available in Ambari or Cloudera Manager (but if you're using a Cloudera CDH cluster, you already have Hue). The Thrift interface is what you want. Default port is 10000
When you setup the Hue, you will need to find the hue.ini file, in which, edit the section that starts with [beeswax] and fill in the necessary values. Personally, I find that section fairly straightforward
You can read the Hue github to find the requirements for running it on a Mac

Browsing Hbase data in Hue through Phoenix

I am using CDH 5.4.4 and installed Phoenix parcel to be able to run SQL on hbase tables. Has anyone tried to browse that data using Hue?
I know since we can connect using JDBC connection to Phoenix, there must be a way for Hue to connect to it too.
The current status is that we would need to add HUE-2745 and then it would show up in DBQuery / Notebook
The latest https://phoenix.apache.org/server.html is brand new and JDBC only.
If there was an HiveServer2 Thrift API or ODBC for Phoenix it would work almost out of the box in the SQL or DB Query apps. Hue could work with JDBC but there will be a JDBC connector that is GPL (so to install separately).
The Hue jira for integrating Phoenix is https://issues.cloudera.org/browse/HUE-2121.

How to configure Hue-2.5.0 and HIve-0.11.0

From past 2 days I have been working on setting up Hue but no luck.
The versions I tried with hive 0.11.0 :- 3.5, 3.0, 2.4, 2.1, 2.3, 2.5
After much googling i came to know 3.5 and 3.0 (documentation says 0.11) are compatible with hive 0.12 or 0.13 but as mine is 0.11 I faced issues like : Required client protocal , no database found, list index error.
Finally I was able to set up Hue 2.5.0 and it indeed connects with hiveserver2.
My Properties in hue.ini :
beeswax_server_host=localhost
server_interface=hiveserver2
beeswax_server_port=10000
hive_home_dir=/usr/lib/hive/hive-0.11.0
hive_conf_dir=/usr/lib/hive/hive-0.11.0/conf
All my tables are in hive which hiveserver2 does not show if I access it using "beeline"
but if I start hive thrift server then I can access all my tables and schemas in R-studio.
I'm not getting why hiverserver2 cannot access hive tables, is it something different?
Hue.ini file give only two options : beeswex and hiveserver2 for connectivity.
I have done a lot of online google but this point nothing is helping.
please let me know if :
hiverserver2 can import hive data
OR
hiverserver can be used with hue 2.5.0
OR
if I'm missing anything
If there is any more information required please let me know.
Apache Hive is missing some patches from CDH that have not been accepted by the community. The Thrift protocol version is also different depending depending on the release.
The current workarounds are to cherry-pick the missing patches from CDH or to use Hive from CDH.
You can read more here for example.
You should have a hive client installed on the Hue machine, with a configured hive-site.xml.
Then you can comment out all the [[beeswax]] section and Hue should run correctly.

Is it possible to connect tableau to cloudera hive in windows 7?

I downloaded and installed cloudera hive drivers provided in the link http://www.tableausoftware.com/support/drivers. But when I try to add driver in ODBC connections, it is not shown there. I read some where that cloudera hive driver will work only
with windows 2008. I am using windows 7. Kindly help me.
A little late in the day, but here are some more detailed articles from the Tableau Knowledge Base may be of interest to you or anyone else interested in this question.
Connecting to Hadoop Hive
Extra Capabilities for Hadoop Hive
Designing for Performance Using Hadoop Hive
Administering Hadoop and Hive for Tableau Connectivity
Failing that, if you are still unable to connect to Cloudera Hive and you're a registered customer, or have downloaded a trial, then you can always drop an email to support#tableausoftware.com and ask for help there. :)
Yes it is possible to connect Tableau to cloudera Hive on Windows 7.
Steps are:
1. start the thrift server for hive
nohup HIVE_PORT=10000 hive --service hiveserver &
2. install the Hive ODBC driver from https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Tableau+License+Agreement
3. open Tableau
Connect to Data -> Cloudera Hadoop Hive -> Give the server ip and port :10000 (you can change the thrift server port if you need to by changing HIVE_PORT to some other value while starting the Hive server)
The rest is straight forward.
Also make sure that the required port (10000 or which ever you chose) is open in the firewall.
Please make sure that you tried to create the ODBC connection in ODBC 32bit, since the drivers and the Tableau desktop is a 32bit application. You can run the ODBC 32bit driver panel with the odbcad32.exe command line.

Resources