How to load table from SQL server using H2o in R? - h2o

I try to load table into R using h2o but had the following error
my_data <- h2o.import_sql_table(my_sql_conn, table, username, password)
ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/99/ImportSQLTable)
java.lang.RuntimeException [1] "java.lang.RuntimeException: SQLException: No suitable driver found for jdbc:mysql://10.140.20.29/MySQL?&useSSL=false\nFailed to connect and read from SQL database with connection_url: jdbc:mysql://10.140.20.29/MySQL?&useSSL=false"
Can someone help me with this? Thank you so much!

You need a supported JDBC (Build on JDBC 42 Core) driver to connect from H2O to SQL Server. You can download Microsoft JDBC Driver 4.2 for SQL Server from the link below first:
https://www.microsoft.com/en-us/download/details.aspx?id=54671
After that please follow the article below to first test JDBC driver from R/Python H2O client and then connect to your database:
https://aichamp.wordpress.com/2017/03/20/building-h2o-glm-model-using-postgresql-database-and-jdbc-driver/
Above article is for postgres however you can use it with SQL server using an appropriate driver.

For Windows, remember to use ; instead : for the -cp argument.
java -Xmx4g -cp sqljdbc42.jar;h2o.jar water.H2OApp -port 3333
water.H2OApp is the main class in h2o.jar.
Important Note: SQL Server is not supported so far( August/2017).
You may use MariaDB to load datasets:
From Windows console:
java -Xmx4G -cp mariadb-java-client-2.1.0.jar;h2o.jar water.H2OApp -port 3333
Note. For Linux, replace ";" with ":"
From R:
sqlConn <- "jdbc:mariadb://10.106.7.46:3306/DBName"
userName <- "dbuser"
userPass <- "dbpass."
sql_Query <- "SELECT * FROM dbname.tablename;"
mydata <- h2o.import_sql_select( sqlConn, sql_Query, userName, userPass )

Related

How to install JDBC driver on Databricks Cluster?

I'm trying to get the data from my Oracle Database to a Databricks Cluster. But I think I'm doing it wrong:
On the cluster library I just installed the ojdbc8.jar and then after that I opened a notebook and did this to connect:
CREATE TABLE oracle_table
USING org.apache.spark.sql.jdbc
OPTIONS (
dbtable 'table_name',
driver 'oracle.jdbc.driver.OracleDriver',
user 'username',
password 'pasword',
url 'jdbc:oracle:thin://#<hostname>:1521/<db>')
And it says:
java.sql.SQLException: Invalid Oracle URL specified
Can someone help? I've been reading documentations but there's no clear instruction on how I should actually install this jar step by step. I might be using the wrong jar? Thanks!
I have managed to set this up in Python/PySpark as follows:
jdbcUrl = "jdbc:oracle:thin:#//hostName:port/databaseName"
connectionProperties = {
"user" : username,
"password" : password,
"driver" : "oracle.jdbc.driver.OracleDriver"
}
query = "(select * from mySchema.myTable )"
df = spark.read.jdbc(url=jdbcUrl, table=query1, properties=connectionProperties)
I am using the Oracle JDBC Thin Driver instantclient-basic-linux.x64-21.5.0.0.0, as available on the Oracle webpages. The current version is 21.7 I think, but it should work the same way.
Check this link to understand the two different notations for jdbc URLs

cx_Oracle connection by python3.5

I've tried several attempt to connect Oracle DB but still unable to connect. Following is my code to connect. However, I could connect Oracle DB through the terminal like this:
$ sqlplus64 uid/passwd#192.168.0.5:1521/WSVC
My evironment: Ubuntu 16.04 / 64bit / Python3.5
I wish your knowledge and experience associated with this issue to be shared. Thank you.
import os
os.chdir("/usr/lib/oracle/12.2/client64/lib")
import cx_Oracle
# 1st attempt
ip = '192.168.0.5'
port = 1521
SID = 'WSVC'
dsn_tns = cx_Oracle.makedsn(ip, port, SID)
# dsn_tns = cx_Oracle.makedsn(ip, port, service_name=SID)
db = cx_Oracle.connect('uid', 'passwd', dsn_tns)
cursor = db.cursor()
-------------------------------------------------
# 2nd attempt
conn = "uid/passwd#(DESCRIPTION=(SOURCE_ROUTE=OFF)(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.0.5)(PORT=1521)))(CONNECT_DATA=(SID=WSVC)(SRVR=DEDICATED)))"
db = cx_Oracle.connect(conn)
cursor = db.cursor()
------------------------------------------------------
# ERROR Description
cx_Oracle.InterfaceError: Unable to acquire Oracle environment handle
The error "unable to acquire Oracle environment handle" is due to your Oracle configuration being incorrect. A few things that should help you uncover the source of the problem:
when using Instant Client, do NOT set the environment variable ORACLE_HOME; that should only be set when using a full Oracle Client or Oracle Database installation
the value of LD_LIBRARY_PATH should contain the path which contains libclntsh.so; the value you selected looks like it is incorrect and should be /usr/lib/oracle/12.2/client64/lib instead
you can verify which Oracle Client libraries are being loaded by using the ldd command as in ldd cx_Oracle.cpython-35m-x86_64-linux-gnu.so

Connect to Google BigQuery in R using Simba JDBC driver

I cannot connect to my Google Bigquery dataset via Simba JDBC driver.
I want to connect from R application using RJDBC package. I set the parameters as follows:
library(RJDBC)
driver <- JDBC(driverClass = "com.simba.googlebigquery.jdbc42.Driver", classPath = "~/JDBC/GoogleBigQueryJDBC42.jar", identifier.quote = "'")
conn <- dbConnect(driver,"jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=My_project_Id;OAuthType=1;")
but I receive an error saying:
Error in .jcall(drv#jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: com/google/api/client/json/JsonFactory
Please tell me what I am doing wrong?
I found the problem, I should have add required libraries to Java class-path. So in R I executed the following commands:
.jaddClassPath("jackson-core-2.1.3.jar")
.jaddClassPath("google-oauth-client-1.22.0.jar")
.jaddClassPath("google-http-client-jackson2-1.22.0.jar")
.jaddClassPath("google-http-client-1.22.0.jar")
.jaddClassPath("GoogleBigQueryJDBC41.jar")
.jaddClassPath("google-api-services-bigquery-v2-rev320-1.22.0.jar")
.jaddClassPath("google-api-client-1.22.0.jar")

How to connect to Teradata from pyspark?

I am trying to connect to Teradata and DB2 from Pyspark.
I am using the below jars :
tdgssconfig-15.10.00.14.jar
teradata-connector-1.4.1.jar
terajdbc4-15.10.00.14.jar
&
db2jcc4.jar
Connection string:
df1 = sqlContext.load(source="jdbc", driver="com.teradata.jdbc.TeraDriver", url=db_url,user="db_user",TMODE="TERA",password="db_pwd",dbtable="U114473.EMPLOYEE")
df = sqlContext.read.format('jdbc').options(url='jdbc:db2://10.123.321.9:50000/DB599641',user='******',password='*****',driver='com.ibm.db2.jcc.DB2Driver', dbtable='DSN1.EMPLOYEE')
Both gives me Driver not found error.
Can we use JDBC drivers for pyspark?
Like James Tobin said, use the pyspark2 --jars /jarpath option when you start your pyspark sessioni or when you submit your py to spark

Connecting to Oracle DB using Ruby

I am stuck with connecting to Oracle DB, have read lots of stuff but no help on result.
I have remote Oracle DB, I am connecting to it using DBVisualizer setting connection like this:
DB Type : Oracle
Driver (jdbc) : Oracle thin
Database URL: jdbc:oracle:thin:#10.10.100.10:1521/VVV.LOCALDOMAIN
UserIdf: SomeUser
Pass: SomePass
Connection works ok.
What I do in Ruby is :
require 'oci8'
require 'dbi'
...
conn = OCI8.new('SomeUser','SomePass','//10.10.100.10:1521/VVV.LOCALDOMAIN')
...
What I get is:
ORA-12545: Connect failed because target host or object does not exist
oci8.c:360:in oci8lib.so
the third parameter needs to be the TNS hostname, if you use SQL plus it is also the third parameter in the connectstring, you can find it also in the tnsnames.ora file in the oracle maps
in SQLPlus : connect user/password#hostname;
in oci8 : conn = OCI8.new('SomeUser','SomePass',hostname)
Here a working sample, obfuscated the parameters of course
require 'oci8'
oci = OCI8.new('****','***','****.***')
oci.exec('select * from table') do |record|
puts record.join(',')
end

Resources