Connect to apche spark hive remotely through jdbc client like Squirrel SQL - jdbc

I have a running spark cluster installed with hive.I am able to run SQL queries through org.apache.spark.sql.hive.HiveContext locally via beeline.Hive thriftserver is running.
But I want to know how to connect to this hive metastore from a remote computer through jdbc without having installed hive all over again in this remote system.
Please suggest what exact driver would be needed and any jdbc client application like Squirrel SQL Client.

The following jars, will help(for cdh distribution)
commons-configuration-1.6.jar
commons-logging-1.1.1.jar
commons-logging-1.1.3.jar
commons-logging-1.1.jar
commons-logging-1.2.jar
hadoop-common-2.6.0-cdh5.4.4-tests.jar
hadoop-common-2.6.0-cdh5.4.4.jar
hadoop-core-2.6.0-mr1-cdh5.4.4.jar
hive-exec-1.1.0-cdh5.4.4.jar
hive-jdbc-1.1.0-cdh5.4.4-standalone.jar
hive-jdbc-1.1.0-cdh5.4.4.jar
hive-service-1.1.0-cdh5.4.4.jar
libfb303-0.9.0.jar
libthrift-0.9.0.jar
log4j-1.2.16.jar
slf4j-api-1.7.5.jar
slf4j-log4j12-1.7.5.jar

Related

Run Azure Databricks Notebook from Apache Nifi

I am new to apache Nifi, is there any way to run Azure Databricks notebook from Nifi?
Or is to be done by different tool?
You can run Azure databricks notebook from Apache Nifi.
To Connect Databricks Data in Apache NiFi
Download the CData JDBC Driver for Databricks installer, unzip the package, and run the JAR file to install the driver.
Copy the CData JDBC Driver JAR file (and license file if it exists), cdata.jdbc.databricks.jar (and cdata.jdbc.databricks.lic), to the Apache NiFi lib subfolder, for example, C:\nifi-1.3.0-bin\nifi-1.3.0\lib.
On Windows, the default location for the CData JDBC Driver is C:\Program Files\CData\CData JDBC Driver for Databricks.
Start Apache NiFi. For example:
cd C:\nifi-1.3.0-bin\nifi-1.3.0\bin
run-nifi.bat
Lastly Navigate to the Apache NiFi UI in your web browser: typically http://localhost:8080/nifi
You can refer this article( https://www.cdata.com/kb/tech/databricks-jdbc-apache-nifi.rst ) for more information

problem with connect presto to hive-hadoop3

I have hadoop 3.1.2 and hive 3.1.2 on a cluster and I want to connect to hive with presto-server-0.265.1.
I have just one catalog file in /opt/presto/etc/catalog as hive.properties here is:
connector.name=hive-hadoop2
hive.metastore.uri=thrift://192.168.49.13:9083
The presto service run but it can not connect to hive because I use hadoop3 and when I change hive.properties,presto service can not run.
how can I connect to hadoop3?
Update:
it wasn't about hadoop. hive metastore was not installed correctly so presto had problem to connect hive metastore

Connecting HiveServer2 from pyspark

I am stuck at point as , how to use pyspark to fetch data from hive server using jdbc.
I am Trying to connect to HiveServer2 running on my local machine from pyspark using jdbc. All components HDFS,pyspark,HiveServer2 are on same machine.
Following is the code i am using to connect :
connProps={ "username" : 'hive',"password" : '',"driver" : "org.apache.hive.jdbc.HiveDriver"}
sqlContext.read.jdbc(url='jdbc:hive2://127.0.0.1:10000/default',table='pokes',properties=connProps)
dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:hive://localhost:10000/default").option("driver", "org.apache.hive.jdbc.HiveDriver").option("dbtable", "pokes").option("user", "hive").option("password", "").load()
both methods used above are giving me same error as below:
org.apache.spark.sql.AnalysisException: java.lang.RuntimeException:
java.lang.RuntimeException: Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection
to the given database. JDBC url =
jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
Terminating connection pool (set lazyInit to true if you expect to
start your database after your app).
ERROR XSDB6: Another instance of Derby may have already booted the database /home///jupyter-notebooks/metastore_db
metastore_db is located at same directory where my jupyter notebooks are created. but hive-site.xml is having different metastore location.
I have already checked reffering to other questions about same error saying other spark-shell or such process is running,but its not. Even if i try following command when HiveServer2 and HDFS are down i am getting same error
spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
I am able to connect to hives using java program using jdbc. Am I missing something here? Please help.Thanks in advance.
Spark should not use JDBC to connect to Hive.
It reads from the metastore, and skips HiveServer2
However, Another instance of Derby may have already booted the database means that you're running Spark from another session, such as another Jupyter kernel that's still running. Try setting a different metastore location, or work on setting up a remote Hive metastore using a local Mysql or Postgres database and edit $SPARK_HOME/conf/hive-site.xml with that information.
From SparkSQL - Hive tables
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
# spark is an existing SparkSession
spark.sql("CREATE TABLE...")

sqoop + cloudera manger jdbc driver not found

I am trying to set up the JDBC driver for sqoop in cloudera manager.
Here is some background on my setup:
1) I have a 5 machine Hadoop cluster running CDH 4.5 on ubundu
2) Installed sqoop through cloudera manager
I have already downloaded the latest JDBC mysqlconnector jar and copied that to the following locations:
sudo cp /home/clouderasudo/jbdcDriver/mysql-connector-java-5.1.29-bin.jar /usr/lib/sqoop/lib/
sudo cp /home/clouderasudo/jbdcDriver/mysql-connector-java-5.1.29-bin.jar /usr/lib/oozie/lib/
But it still get below an error when I try to set up a new job in sqoop with com.mysql.jdbc.Driver as the JDBC driver class:
Can't load specified driver
Any help appreciated.
you may need to copy the database driver file in directory that contains sqoop library, in my case it is /opt/cloudera/parcels/CDH/lib/sqoop/lib
Savio
for me it was
hdfs path: /user/oozie/share/lib/lib_20140909154837/sqoop
restart oozie in cloudera manager

phoenix hbase not connecting remotely

I have two cloudera VM and on both i've configured phoenix and it is working fine as long as it is localhost.
When i'm trying to connect hbase from one VM from phoenix of another VM, i'm using this command
$ ./sqlline.sh xxx.xx.xx.xx:2181
The connection is successful, but phoenix is still referencing the local HBASE and not the remote HBASE. Can anyone tell me where is the problem?

Resources