Spark - Oracle timezone error - oracle

I am running a spark job to load to Oracle. But I am getting following error.
java.sql.SQLException: ORA-00604: error occurred at recursive SQL level 1
ORA-01882: timezone region not found
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:392)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:385)
at oracle.jdbc.driver.T4CTTIfun.processError(T4CTTIfun.java:1018)
at oracle.jdbc.driver.T4CTTIoauthenticate.processError(T4CTTIoauthenticate.java:501)
Here is what I have in my code
val oracleProps = new java.util.Properties()
oracleProps.put("driver", oracleDriver)
oracleProps.put("driver", oracleDriver)
oracleProps.put("user", oracleUser)
oracleProps.put("password", oraclePwd)
oracleProps.put("batchsize", oracleBatchSize)
dataframe.write.mode("overwrite").jdbc(oracleUrl, oracleBaseTable, oracleProps)
The same code works from Spark-Shell but not from spark-submit.
The same spark-submit works on other clusters.
Appreciate you help!

I had this error in pyspark oracle jdbc connection - "ORA-01882: timezone region not found". I was able to connect after setting oracle.jdbc.timezoneAsRegion to false.
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0.
JDBC driver used - ojdbc8.jar
df.write \
.format("jdbc") \
.option("url", "JDBC_URL") \
.option('driver', 'oracle.jdbc.driver.OracleDriver') \
.option("oracle.jdbc.timezoneAsRegion", "false") \
.option("dbtable", "SCHEMA.TABLE") \
.option("user", "USERID") \
.option("password", "PASSWORD") \
.mode("overwrite") \
.save()

I write a program to insert data from a file to oracle database using Spark [version 2.3.0.cloudera3]. As per my program, Oracle database version is "Oracle Database 11g Enterprise Edition Release 11.2.0.1.0".
I was using Oracle JDBC ojdbc8.jar. So I encountered the following problem :
java.sql.SQLException: ORA-00604: error occurred at recursive SQL level 1
ORA-01882: timezone region not found.
Now I change my Oracle JDBC to:ojdbc6.jar, which is compatible with Oracle 11.2.0.1.0. And now it is working perfectly.

Related

Connecting Glue Pyspark to oracle using SSL certificate

I am using Spark readwrite operations for reading/writing to oracle database
Below is the code snippet:
empDF = spark.read \
.format("jdbc") \
.option("url", url) \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.option("ssl", True) \
.option("sslmode", "require" ) \
.option("dbtable", query) \
.option("user", "******") \
.option("password", "******") \
.load()
But I need to add oracle ssl certificate for connecting to the data base.I tried using wallet which I added to /tmp location along with the tnsnames.ora file. I have added in the URL in the below format.
url = "jdbc:oracle:thin:#apm_url?TNS_ADMIN=/tmp"
But still am getting the below error and not able to connect
An error occurred while calling o104.load. IO Error: IO Error PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target, connect lapse 30 ms., Authentication lapse 0 ms.
What is the version of the Oracle JDBC driver that you are using? Check out QuickStart guide for using Oracle wallets. You need to have oraclepki.jar, osdt_core.jar, and osdt_cert.jar in the classpath.

How to install JDBC driver on Databricks Cluster?

I'm trying to get the data from my Oracle Database to a Databricks Cluster. But I think I'm doing it wrong:
On the cluster library I just installed the ojdbc8.jar and then after that I opened a notebook and did this to connect:
CREATE TABLE oracle_table
USING org.apache.spark.sql.jdbc
OPTIONS (
dbtable 'table_name',
driver 'oracle.jdbc.driver.OracleDriver',
user 'username',
password 'pasword',
url 'jdbc:oracle:thin://#<hostname>:1521/<db>')
And it says:
java.sql.SQLException: Invalid Oracle URL specified
Can someone help? I've been reading documentations but there's no clear instruction on how I should actually install this jar step by step. I might be using the wrong jar? Thanks!
I have managed to set this up in Python/PySpark as follows:
jdbcUrl = "jdbc:oracle:thin:#//hostName:port/databaseName"
connectionProperties = {
"user" : username,
"password" : password,
"driver" : "oracle.jdbc.driver.OracleDriver"
}
query = "(select * from mySchema.myTable )"
df = spark.read.jdbc(url=jdbcUrl, table=query1, properties=connectionProperties)
I am using the Oracle JDBC Thin Driver instantclient-basic-linux.x64-21.5.0.0.0, as available on the Oracle webpages. The current version is 21.7 I think, but it should work the same way.
Check this link to understand the two different notations for jdbc URLs

How to load table from SQL server using H2o in R?

I try to load table into R using h2o but had the following error
my_data <- h2o.import_sql_table(my_sql_conn, table, username, password)
ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/99/ImportSQLTable)
java.lang.RuntimeException [1] "java.lang.RuntimeException: SQLException: No suitable driver found for jdbc:mysql://10.140.20.29/MySQL?&useSSL=false\nFailed to connect and read from SQL database with connection_url: jdbc:mysql://10.140.20.29/MySQL?&useSSL=false"
Can someone help me with this? Thank you so much!
You need a supported JDBC (Build on JDBC 42 Core) driver to connect from H2O to SQL Server. You can download Microsoft JDBC Driver 4.2 for SQL Server from the link below first:
https://www.microsoft.com/en-us/download/details.aspx?id=54671
After that please follow the article below to first test JDBC driver from R/Python H2O client and then connect to your database:
https://aichamp.wordpress.com/2017/03/20/building-h2o-glm-model-using-postgresql-database-and-jdbc-driver/
Above article is for postgres however you can use it with SQL server using an appropriate driver.
For Windows, remember to use ; instead : for the -cp argument.
java -Xmx4g -cp sqljdbc42.jar;h2o.jar water.H2OApp -port 3333
water.H2OApp is the main class in h2o.jar.
Important Note: SQL Server is not supported so far( August/2017).
You may use MariaDB to load datasets:
From Windows console:
java -Xmx4G -cp mariadb-java-client-2.1.0.jar;h2o.jar water.H2OApp -port 3333
Note. For Linux, replace ";" with ":"
From R:
sqlConn <- "jdbc:mariadb://10.106.7.46:3306/DBName"
userName <- "dbuser"
userPass <- "dbpass."
sql_Query <- "SELECT * FROM dbname.tablename;"
mydata <- h2o.import_sql_select( sqlConn, sql_Query, userName, userPass )

Sqoop error for java.io.charconversionException which is non UTF-8 charactor

I was trying to sqoop import the data from db2.ibm but stuck up with the error which is
java.io.charconversionException: SQL exception in nextKeyValue
And caused by [jcc][t4][1065]..... Caught java.io.CharConversionException ERRORCODE=-4220, SQLSTATE=null
I've tried
sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://host:port/db --verbose table.views_data -m 1 --target-dir /tmp/data
It sounds like there is a bad character in the table you're loading per this IBM article: http://www-01.ibm.com/support/docview.wss?uid=swg21684365
If you want to try and workaround it without fixing the data as suggested above, the DataDirect DB2 JDBC driver has a property to override code page with one of these values: http://media.datadirect.com/download/docs/jdbc/alljdbc/help.html#page/jdbcconnect%2Fcodepageoverride.html%23

How to connect to Teradata from pyspark?

I am trying to connect to Teradata and DB2 from Pyspark.
I am using the below jars :
tdgssconfig-15.10.00.14.jar
teradata-connector-1.4.1.jar
terajdbc4-15.10.00.14.jar
&
db2jcc4.jar
Connection string:
df1 = sqlContext.load(source="jdbc", driver="com.teradata.jdbc.TeraDriver", url=db_url,user="db_user",TMODE="TERA",password="db_pwd",dbtable="U114473.EMPLOYEE")
df = sqlContext.read.format('jdbc').options(url='jdbc:db2://10.123.321.9:50000/DB599641',user='******',password='*****',driver='com.ibm.db2.jcc.DB2Driver', dbtable='DSN1.EMPLOYEE')
Both gives me Driver not found error.
Can we use JDBC drivers for pyspark?
Like James Tobin said, use the pyspark2 --jars /jarpath option when you start your pyspark sessioni or when you submit your py to spark

Resources