I'm trying to connect to oracle database using pyspark in databricks notebook. I can not find any documentation to install the library for the driver on the cluster
many thanks in advance.
If it is an interactive cluster I'd use maven for the installation. You can specify the Coordinates or search the package you want to install using the UI
Related
I have a Kubernetes cluster with Kylin for Back-End and Superset as Front-End.
Everything works great for the example "Default" database within the Kylin application.
Now I am trying to add SQL Server database where I have added the following code into $KYLIN_HOME/conf/kylin.properties file:
kylin.source.default=8
kylin.source.jdbc.connection-url=jdbc:sqlserver://hostname:1433;database=sample
kylin.source.jdbc.driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
kylin.source.jdbc.dialect=mssql
kylin.source.jdbc.user=your_username
kylin.source.jdbc.pass=your_password
kylin.source.jdbc.sqoop-home=/usr/hdp/current/sqoop-client
kylin.source.jdbc.filed-delimiter=|
As documentation describes I also added the SQL-SERVER-JDBC-Database-Driver jar file into $KYLIN_HOME/ext/ directory.
In addition, the documentation also mentions installing SQOOP and add the SQL-SERVER-JDBC-Database-Driver jar file also in the $SQOOP_HOME/lib/ directory.
But inside the container I do not have pip to install it, so should I create a new image with pip and SQOOP installed? Is this the right way? And what Kylin needs?
UPDATE
After some investigation, managed to install also pip in case I needed it because originally I was thinking that I should install pysqoop which didn't work. Documentation is suggesting to install Apache SQOOP, and I am not sure what I should download and where to place the files.
Kylin has a document on Setup JDBC Data Source.
The sqoop is Apache Sqoop, a bulk data transferring tool on Hadoop. Written in Java, kylin and sqoop has no need for python and pip.
Suggest investigate further in the Hadoop world. :-)
I'm trying to use HUE Beeswax to connect my company's Hive database. Firstly, is it possible to use HUE installed on my mac to be connected with remote Hive server? If it does, how am I supposed to find the address for the Hive server which is running on our private server? Only thing I can do is to type 'hive' and put some sql queries in hive shell. I already installed HUE but can't figure out how to connect it to the remote Hive server. Any tips would be much appreciated.
If all you want is a desktop connection to Hive, you only need a JDBC client, not a full web app like Hue.
In any case, Hive CLI is deprecated. Beeline is preferred. To use Beeline and Hue, you need a HiveServer2 running.
To find the address of the HiveServer2, if you have it, you need to find your hive-site.xml file on the Hadoop cluster, and export it. Other ways to get this information are available in Ambari or Cloudera Manager (but if you're using a Cloudera CDH cluster, you already have Hue). The Thrift interface is what you want. Default port is 10000
When you setup the Hue, you will need to find the hue.ini file, in which, edit the section that starts with [beeswax] and fill in the necessary values. Personally, I find that section fairly straightforward
You can read the Hue github to find the requirements for running it on a Mac
Kylin 2.1.0 is up and running on a Hadoop 2.7.0 cluster with HBase
1.2.6 and Hive 2.1.1 installed.
We also have Pentaho BI server 6.1.0.1.196 (mondrian 3.11 and saiku)
installed on another machine.
We want Pentaho to access cubes created in Kylin and use Saiku Analytics
Did refer few suggestions on internet but was unable to achieve my goal
https://github.com/mustangore/kylin-mondrian-interaction
Any help on this is truly appreciated.
There is a product called KyAnalyzer,which integrates Mondrain and Saiku with Kylin seemlessly.
Here is the document for this product: https://kyligence.gitbooks.io/kap-manual/en/kyanalyzer/kyanalyzer.en.html
We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we need to have any additional connector for that? I have done comprehensive research to check if this is supported but could not find anything. Did anyone already try this? If so, can you specify the steps and link to any documentation?
Informatica version: 9.6.1 (Hotfix 2)
You can use the odbc driver provided by cloudera.
http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html
For Irene, the you can use the same driver the above one is based the simba driver.
http://www.simba.com/drivers/hbase-odbc-jdbc/
I am using CDH 5.4.4 and installed Phoenix parcel to be able to run SQL on hbase tables. Has anyone tried to browse that data using Hue?
I know since we can connect using JDBC connection to Phoenix, there must be a way for Hue to connect to it too.
The current status is that we would need to add HUE-2745 and then it would show up in DBQuery / Notebook
The latest https://phoenix.apache.org/server.html is brand new and JDBC only.
If there was an HiveServer2 Thrift API or ODBC for Phoenix it would work almost out of the box in the SQL or DB Query apps. Hue could work with JDBC but there will be a JDBC connector that is GPL (so to install separately).
The Hue jira for integrating Phoenix is https://issues.cloudera.org/browse/HUE-2121.