Spark - Oracle timezone error

Spark - Oracle timezone error - oracle

I am running a spark job to load to Oracle. But I am getting following error.
java.sql.SQLException: ORA-00604: error occurred at recursive SQL level 1
ORA-01882: timezone region not found
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:392)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:385)
at oracle.jdbc.driver.T4CTTIfun.processError(T4CTTIfun.java:1018)
at oracle.jdbc.driver.T4CTTIoauthenticate.processError(T4CTTIoauthenticate.java:501)
Here is what I have in my code
val oracleProps = new java.util.Properties()
oracleProps.put("driver", oracleDriver)
oracleProps.put("driver", oracleDriver)
oracleProps.put("user", oracleUser)
oracleProps.put("password", oraclePwd)
oracleProps.put("batchsize", oracleBatchSize)
dataframe.write.mode("overwrite").jdbc(oracleUrl, oracleBaseTable, oracleProps)
The same code works from Spark-Shell but not from spark-submit.
The same spark-submit works on other clusters.
Appreciate you help!

I had this error in pyspark oracle jdbc connection - "ORA-01882: timezone region not found". I was able to connect after setting oracle.jdbc.timezoneAsRegion to false.
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0.
JDBC driver used - ojdbc8.jar
df.write \
.format("jdbc") \
.option("url", "JDBC_URL") \
.option('driver', 'oracle.jdbc.driver.OracleDriver') \
.option("oracle.jdbc.timezoneAsRegion", "false") \
.option("dbtable", "SCHEMA.TABLE") \
.option("user", "USERID") \
.option("password", "PASSWORD") \
.mode("overwrite") \
.save()

I write a program to insert data from a file to oracle database using Spark [version 2.3.0.cloudera3]. As per my program, Oracle database version is "Oracle Database 11g Enterprise Edition Release 11.2.0.1.0".
I was using Oracle JDBC ojdbc8.jar. So I encountered the following problem :
java.sql.SQLException: ORA-00604: error occurred at recursive SQL level 1
ORA-01882: timezone region not found.
Now I change my Oracle JDBC to:ojdbc6.jar, which is compatible with Oracle 11.2.0.1.0. And now it is working perfectly.

Related

Connecting Glue Pyspark to oracle using SSL certificate

I am using Spark readwrite operations for reading/writing to oracle database
Below is the code snippet:
empDF = spark.read \
.format("jdbc") \
.option("url", url) \
.option("driver", "oracle.jdbc.driver.OracleDriver") \
.option("ssl", True) \
.option("sslmode", "require" ) \
.option("dbtable", query) \
.option("user", "******") \
.option("password", "******") \
.load()
But I need to add oracle ssl certificate for connecting to the data base.I tried using wallet which I added to /tmp location along with the tnsnames.ora file. I have added in the URL in the below format.
url = "jdbc:oracle:thin:#apm_url?TNS_ADMIN=/tmp"
But still am getting the below error and not able to connect
An error occurred while calling o104.load. IO Error: IO Error PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target, connect lapse 30 ms., Authentication lapse 0 ms.

What is the version of the Oracle JDBC driver that you are using? Check out QuickStart guide for using Oracle wallets. You need to have oraclepki.jar, osdt_core.jar, and osdt_cert.jar in the classpath.

How to install JDBC driver on Databricks Cluster?

I'm trying to get the data from my Oracle Database to a Databricks Cluster. But I think I'm doing it wrong:
On the cluster library I just installed the ojdbc8.jar and then after that I opened a notebook and did this to connect:
CREATE TABLE oracle_table
USING org.apache.spark.sql.jdbc
OPTIONS (
dbtable 'table_name',
driver 'oracle.jdbc.driver.OracleDriver',
user 'username',
password 'pasword',
url 'jdbc:oracle:thin://#<hostname>:1521/<db>')
And it says:
java.sql.SQLException: Invalid Oracle URL specified
Can someone help? I've been reading documentations but there's no clear instruction on how I should actually install this jar step by step. I might be using the wrong jar? Thanks!

I have managed to set this up in Python/PySpark as follows:
jdbcUrl = "jdbc:oracle:thin:#//hostName:port/databaseName"
connectionProperties = {
"user" : username,
"password" : password,
"driver" : "oracle.jdbc.driver.OracleDriver"
}
query = "(select * from mySchema.myTable )"
df = spark.read.jdbc(url=jdbcUrl, table=query1, properties=connectionProperties)
I am using the Oracle JDBC Thin Driver instantclient-basic-linux.x64-21.5.0.0.0, as available on the Oracle webpages. The current version is 21.7 I think, but it should work the same way.
Check this link to understand the two different notations for jdbc URLs

How to load table from SQL server using H2o in R?

I try to load table into R using h2o but had the following error
my_data <- h2o.import_sql_table(my_sql_conn, table, username, password)
ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/99/ImportSQLTable)
java.lang.RuntimeException [1] "java.lang.RuntimeException: SQLException: No suitable driver found for jdbc:mysql://10.140.20.29/MySQL?&useSSL=false\nFailed to connect and read from SQL database with connection_url: jdbc:mysql://10.140.20.29/MySQL?&useSSL=false"
Can someone help me with this? Thank you so much!

You need a supported JDBC (Build on JDBC 42 Core) driver to connect from H2O to SQL Server. You can download Microsoft JDBC Driver 4.2 for SQL Server from the link below first:
https://www.microsoft.com/en-us/download/details.aspx?id=54671
After that please follow the article below to first test JDBC driver from R/Python H2O client and then connect to your database:
https://aichamp.wordpress.com/2017/03/20/building-h2o-glm-model-using-postgresql-database-and-jdbc-driver/
Above article is for postgres however you can use it with SQL server using an appropriate driver.

For Windows, remember to use ; instead : for the -cp argument.
java -Xmx4g -cp sqljdbc42.jar;h2o.jar water.H2OApp -port 3333
water.H2OApp is the main class in h2o.jar.
Important Note: SQL Server is not supported so far( August/2017).
You may use MariaDB to load datasets:
From Windows console:
java -Xmx4G -cp mariadb-java-client-2.1.0.jar;h2o.jar water.H2OApp -port 3333
Note. For Linux, replace ";" with ":"
From R:
sqlConn <- "jdbc:mariadb://10.106.7.46:3306/DBName"
userName <- "dbuser"
userPass <- "dbpass."
sql_Query <- "SELECT * FROM dbname.tablename;"
mydata <- h2o.import_sql_select( sqlConn, sql_Query, userName, userPass )

Sqoop error for java.io.charconversionException which is non UTF-8 charactor

I was trying to sqoop import the data from db2.ibm but stuck up with the error which is
java.io.charconversionException: SQL exception in nextKeyValue
And caused by [jcc][t4][1065]..... Caught java.io.CharConversionException ERRORCODE=-4220, SQLSTATE=null
I've tried
sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://host:port/db --verbose table.views_data -m 1 --target-dir /tmp/data

It sounds like there is a bad character in the table you're loading per this IBM article: http://www-01.ibm.com/support/docview.wss?uid=swg21684365
If you want to try and workaround it without fixing the data as suggested above, the DataDirect DB2 JDBC driver has a property to override code page with one of these values: http://media.datadirect.com/download/docs/jdbc/alljdbc/help.html#page/jdbcconnect%2Fcodepageoverride.html%23

How to connect to Teradata from pyspark?

I am trying to connect to Teradata and DB2 from Pyspark.
I am using the below jars :
tdgssconfig-15.10.00.14.jar
teradata-connector-1.4.1.jar
terajdbc4-15.10.00.14.jar
&
db2jcc4.jar
Connection string:
df1 = sqlContext.load(source="jdbc", driver="com.teradata.jdbc.TeraDriver", url=db_url,user="db_user",TMODE="TERA",password="db_pwd",dbtable="U114473.EMPLOYEE")
df = sqlContext.read.format('jdbc').options(url='jdbc:db2://10.123.321.9:50000/DB599641',user='******',password='*****',driver='com.ibm.db2.jcc.DB2Driver', dbtable='DSN1.EMPLOYEE')
Both gives me Driver not found error.
Can we use JDBC drivers for pyspark?

Like James Tobin said, use the pyspark2 --jars /jarpath option when you start your pyspark sessioni or when you submit your py to spark

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Spark - Oracle timezone error - oracle

Related

Connecting Glue Pyspark to oracle using SSL certificate

How to install JDBC driver on Databricks Cluster?

How to load table from SQL server using H2o in R?

Sqoop error for java.io.charconversionException which is non UTF-8 charactor

How to connect to Teradata from pyspark?

Categories

Resources