Connecting Oracle DB from PySpark 3.1.2 - fails with Py4JJavaError - oracle

I am trying out PySpark3.2.1 with Oracle 11G. It fails with the following error:
Py4JJavaError: An error occurred while calling o44.load.
: java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
My code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("PySpark_Oracle_Connection").getOrCreate()
driver = 'oracle.jdbc.OracleDriver'
url = 'jdbc:oracle:thin:#hostname:port/dbTEST'
user = 'myname'
password = 'mypswd'
table = 'mytable'
SPARK_CLASS_PATH = "C:\Oracle_Client\jdbc\lib\ojdbc8.jar"
df = spark.read.format('jdbc')\
.option('driver', driver)\
.option('url', url)\
.option('dbtable', table)\
.option('user',user)\
.option('password',password).load()
I'd appreciate a quick help, please. I have gone through previous posts, but still doesn't work.

Py4JJavaError: An error occurred while calling o44.load.
: java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
Error itself suggests the problem root cause i.e. oracle.jdbc.OracleDriver class was not found when spark tried to read from your oracle db table.
So now, you just have to tell the spark to find your jar. This can be done by changing spark-defaults.conf file which should be present in the $SPARK_HOME/conf/ directory. If not present, then add it yourself with the following config:
spark.driver.extraClassPath C:\Oracle_Client\jdbc\lib\ojdbc8.jar
spark.executor.extraClassPath C:\Oracle_Client\jdbc\lib\ojdbc8.jar
Or just use the --jars option while submitting the job.

driver = 'oracle.jdbc.OracleDriver'
is the old version with ojdbc8.jar you must use
driver= 'oracle.jdbc.driver.OracleDriver'

Related

Error while importing table through Sqoop in HDP 2.3.2 sandbox

I am trying to import a 70+GB table in Hive on the HDP 2.3.2 sandbox, I have established a connection between the SQL Server and the sandbox, but, while trying to import the table using the following command:
sudo -u hdfs sqoop import --connect "jdbc:sqlserver://XX.XX.XX.XX;database=XX;username=XX;password=XX" --table XX --split-by ID --target-dir "/user/hdfs/Kunal/2" --hive-import -- --schema dbo
But its giving me the following error
Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.7 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:167)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.7 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
Option 1: To use single mapper (-m 1), note single - symbol. But the whole 70GB will be read in sigle thread , you may face dealy in completing , might as well written into a single hdfs file .
Option 2: To use --split-by with sparse distributed , --split-by Column of the table used to split work units. Ex. Employee_id in emp table will be unique and sparse distibuted .
Refer : http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html latest sqoop user guide.

HiveServer Class Not Found Exception

I'm trying to run hive from the command prompt it is working absolutely fine. But when I try running hiveserver using "hive --service hiveserver" command, I'm getting the following exception.
Starting Hive Thrift Server
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.service.HiveServer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
So I then tried with the command "hive --service hiveserver2"; still I'm not finding any solution.
Can anybody please suggest a solution for this problem.
May be another process (another hiveserver) already listening on port 10000.
can you check it by :
netstat -ntulp | grep ':10000' to see it and if found then kill the process.
Otherwise start the server on another port.
By the way which version you are using ?
This error occurred to me when it can't find hive-service-*.jar in hadoop classpath. Just copy the hive-service-*.jar to your hadoop lib folder or export classpath in hadoop-env.sh. I have mentioned how to add classpath below.
Add this line in hadoop-env.sh:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/hive/lib/hive-*.jar
I have mentioned the path for hive as /usr/local/hive since i have hive installed at that location. Change it to point to your hive installation.

Using Phoenix to help to integrate elastic-search and Hbase. When use sqlline.py,to create table, bad happens

I follow the instruction Connecting Hbase to Elasticsearch in 10 min or less. Everything goes fine before the step: Create a table in Hbase using SQLline. When I type $ $PHOENIX_HOME/hadoop1/bin/sqlline.py localhost , the terminal shows:
znbee#znbee-Aspire-V5-452G:~/phoenix-4.1.0-bin/hadoop1$ bin/sqlline.py localhost
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:localhost none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:localhost
14/12/19 11:35:03 WARN util.Tracing: Tracing will outputs will not be written to any metrics sink! No TraceMetricsSink found on the classpath
java.lang.RuntimeException: Could not create interface org.apache.phoenix.trace.PhoenixSpanReceiver Is the hadoop compatibility jar on the classpath?
at org.apache.hadoop.hbase.CompatibilityFactory.getInstance(CompatibilityFactory.java:60)
at org.apache.phoenix.trace.TracingCompat.newTraceMetricSource(TracingCompat.java:40)
at org.apache.phoenix.trace.util.Tracing.addTraceMetricsSource(Tracing.java:294)
at org.apache.phoenix.jdbc.PhoenixConnection.<clinit>(PhoenixConnection.java:125)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$9.call(ConnectionQueryServicesImpl.java:1516)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$9.call(ConnectionQueryServicesImpl.java:1489)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1489)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:129)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
at sqlline.SqlLine$DatabaseConnection.connect(SqlLine.java:4650)
at sqlline.SqlLine$DatabaseConnection.getConnection(SqlLine.java:4701)
at sqlline.SqlLine$Commands.connect(SqlLine.java:3942)
at sqlline.SqlLine$Commands.connect(SqlLine.java:3851)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at sqlline.SqlLine$ReflectiveCommandHandler.execute(SqlLine.java:2810)
at sqlline.SqlLine.dispatch(SqlLine.java:817)
at sqlline.SqlLine.initArgs(SqlLine.java:633)
at sqlline.SqlLine.begin(SqlLine.java:680)
at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
at sqlline.SqlLine.main(SqlLine.java:424)
Caused by: java.util.NoSuchElementException
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:357)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.hbase.CompatibilityFactory.getInstance(CompatibilityFactory.java:46)
... 24 more

Pentaho Data Integration with Hive connection

I am using Pentaho Data Integration and I am trying to connect to Hive but when i am trying to do so, i am getting below error.....
Error connecting to database [Hive] : org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Error connecting to database: (using class org.apache.hadoop.hive.jdbc.HiveDriver)
org.apache.thrift.transport.TTransportException
org.pentaho.di.core.exception.KettleDatabaseException:
Error occured while trying to connect to the database
Error connecting to database: (using class org.apache.hadoop.hive.jdbc.HiveDriver)
org.apache.thrift.transport.TTransportException
at org.pentaho.di.core.database.Database.normalConnect(Database.java:428)
at org.pentaho.di.core.database.Database.connect(Database.java:361)
at org.pentaho.di.core.database.Database.connect(Database.java:314)
at org.pentaho.di.core.database.Database.connect(Database.java:302)
at org.pentaho.di.core.database.DatabaseFactory.getConnectionTestReport(DatabaseFactory.java:80)
at org.pentaho.di.core.database.DatabaseMeta.testConnection(DatabaseMeta.java:2685)
at org.pentaho.di.ui.core.database.dialog.DatabaseDialog.test(DatabaseDialog.java:109)
at org.pentaho.di.ui.core.database.wizard.CreateDatabaseWizardPage2.test(CreateDatabaseWizardPage2.java:157)
at org.pentaho.di.ui.core.database.wizard.CreateDatabaseWizardPage2$3.widgetSelected(CreateDatabaseWizardPage2.java:147)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.eclipse.jface.window.Window.runEventLoop(Window.java:820)
at org.eclipse.jface.window.Window.open(Window.java:796)
at org.pentaho.di.ui.core.database.wizard.CreateDatabaseWizard.createAndRunDatabaseWizard(CreateDatabaseWizard.java:111)
using settings as localhost, port as 8888 and database as default....
Kindly help, awaiting for your reply....
Regards,
Jiten Pansara
What Hadoop distribution are you using? If you are not using Apache Hadoop 0.20.x, then you will have to configure PDI by setting certain properties, see the following Wiki page for more details on how to set up Pentaho for a particular Hadoop distribution:
http://wiki.pentaho.com/display/BAD/Configuring+Pentaho+for+your+Hadoop+Distro+and+Version
Did you edit plugin.properties in the plugin folder??
data-integration > plugins > pentaho-big-data-plugin > plugin.properties
change the property "active.hadoop.configuration" to the hadoop distribution you are using, eg :
active.hadoop.configuration=hdp20
This might solve the issue.

Connect to Oracle using Slick

I am trying to connect to Oracle using Slick.
I got the slick-extensions_2.10-1.0.0.jar.
Added the line below in Scala
Database.forURL("jdbc:oracle:thin:#myhost:myport:dbalias", "myid", "mypwd", null, driver =
"com.typesafe.slick.driver.oracle.OracleDriver") withSession {.......}
What is the right URL to use for this driver since I got the following error:
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:oracle:thin:#myhost:myport:dbalias
at java.sql.DriverManager.getConnection(Unknown Source)
at java.sql.DriverManager.getConnection(Unknown Source)
at scala.slick.session.Database$$anon$2.createConnection(Database.scala:105)
at scala.slick.session.BaseSession.conn$lzycompute(Session.scala:207)
at scala.slick.session.BaseSession.conn(Session.scala:207)
at scala.slick.session.BaseSession.close(Session.scala:221)
at scala.slick.session.Database.withSession(Database.scala:38)
at scala.slick.session.Database.withSession(Database.scala:46)
It seems you did not make the oracle jdbc driver available in classpath when running your program.

Resources