Dataproc Sqoop job with Postgres throwing error: Trust anchor for certification path not found - sqoop

Trying to submit a sqoop job to dataproc to export data from a postgres database following the article: https://medium.com/google-cloud/migrate-oracle-data-to-bigquery-using-dataproc-and-sqoop-cd3863adde7b
It is erroring out with: org.postgresql.util.PSQLException: SSL error: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
This is the command I am trying to submit (variables have been appropriately set):
gcloud dataproc jobs submit hadoop --cluster=sqoop-cluster --region=us-central1 --class=org.apache.sqoop.Sqoop --jars=$libs -- import -Dmapreduce.job.user.classpath.first=true -Dorg.apache.sqoop.splitter.all
ow_text_splitter=true --connect=$JDBC_STR --username=xxx --password=xxxx--driver=org.postgresql.Driver --target-dir=$STAGING_BUCKET/$TABLE --table=$SCHEMA.$TABLE --enclosed-by
'\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile
The postgres jdbc connection string is as follows (omitting ssl=true throws hba_conf not found):
JDBC_STR=jdbc:postgresql://xxxxx:5432/YYYY?ssl=true
The detailed error:
Job [63fb49544a1141f89f9a12960cc18e18] submitted.
Waiting for job output...
/usr/lib/hadoop/libexec//hadoop-functions.sh: line 2400: HADOOP_COM.GOOGLE.CLOUD.HADOOP.SERVICES.AGENT.JOB.SHIM.HADOOPRUNCLASSSHIM_USER: invalid variable name
/usr/lib/hadoop/libexec//hadoop-functions.sh: line 2365: HADOOP_COM.GOOGLE.CLOUD.HADOOP.SERVICES.AGENT.JOB.SHIM.HADOOPRUNCLASSSHIM_USER: invalid variable name
/usr/lib/hadoop/libexec//hadoop-functions.sh: line 2460: HADOOP_COM.GOOGLE.CLOUD.HADOOP.SERVICES.AGENT.JOB.SHIM.HADOOPRUNCLASSSHIM_OPTS: invalid variable name
2021-10-14 21:48:33,931 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2021-10-14 21:48:34,128 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2021-10-14 21:48:34,156 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2021-10-14 21:48:34,176 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2021-10-14 21:48:34,203 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.Gene
ricJdbcManager. Please specify explicitly which connection manager should be used next time.
2021-10-14 21:48:34,217 INFO manager.SqlManager: Using default fetchSize of 1000
2021-10-14 21:48:34,217 INFO tool.CodeGenTool: Beginning code generation
2021-10-14 21:48:34,504 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: SSL error: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
org.postgresql.util.PSQLException: SSL error: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:64)
Any help is appreciated.
Thanks!

Seems that your PostgreSQL server has SSL enabled, but the client side (Dataproc VMs) are not configured with the server certificate or its root CA.
With ssl=true the client side will verify the server certificate, you can use a Dataproc init action to import the server certificate to Dataproc VMs:
gsutil cp gs://<my-bucket>/server.crt .
# If `JAVA_HOME` is not defined, try `/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64`.
keytool -keystore $JAVA_HOME/lib/security/cacerts -alias postgresql -import -file server.crt
If you don't want to verify server certificate on the client side, instead you want the server to verify the client hostname/IP and certificate, configure your server, and use sslmode=require in the connection string.
For a quick test with server certification verification disabled on the client side, try this in the JDBC connection string:
?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory
See this doc for more information on configuring SSL for PostgreSQL. Also a similar question for reference.

Related

Cannot build cube with apache kylin

I have installed Apache Kylin in Hortonworks's HDP Sandbox image. Following this I have connected Apache Kylin to our Microsoft SQL Data Warehouse and when I try to build a cube the process is failing.
19/06/25 15:35:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187
19/06/25 15:35:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/06/25 15:35:54 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
19/06/25 15:35:54 INFO manager.SqlManager: Using default fetchSize of 1000
19/06/25 15:35:54 INFO tool.CodeGenTool: Beginning code generation
19/06/25 15:35:55 INFO manager.SqlManager: Executing SQL statement: SELECT `V_FACTTRANSACTION_CUBE`.`CUSTOMERFK` as `V_FACTTRANSACTION_CUBE_CUSTOMERFK`
...
19/06/25 15:35:55 ERROR manager.SqlManager: Error executing statement: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '`'.
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '`'.
Seems like Sqoop is generating the query incorrectly by using "`" instead of normal quotes. Is there any way I can configure Sqoop to use the correct syntax ?
Use Sqoop 2 and make all SQL queries uppercase.it's a known issue in Kylin.

Sqoop's import-all-table is not working

Hi i am trying to import all table from all schema from Oracle DB to HDFS.
This is my script:
sqoop-import-all-tables -Dmapreduce.job.user.classpath.first=true -Dhadoop.security.credential.provider.path=jceks://x.jceks --connect jdbc:oracle:thin:#x.x.x.x:1521/yyyy --username xxxx --password xxxx --warehouse-dir /data-warehouse/xxxx --as-avrodatafile --compression-codec snappy --autoreset-to-one-mapper
When i am running this script, not getting any error and no any Job is starting.
Output:
Warning: /usr/hdp/2.6.2.0-205/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
find: failed to restore initial working directory: Permission denied
18/08/11 08:32:51 INFO sqoop.Sqoop: Running **Sqoop version: 1.4.6.2.6.2.0-205**
18/08/11 08:32:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/08/11 08:32:51 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
18/08/11 08:32:51 INFO manager.SqlManager: Using default fetchSize of 1000
18/08/11 08:32:53 INFO manager.OracleManager: Time zone has been set to IST
It seems that the user configured in sqoop does not have enough privileges to query and export the data from Oracle. Please check connect and query from command line to Oracle database.
Regards !!!

InvalidAlgorithmParameterException issue in Sqoop with ssl enabled for MS SQLdatabase

I'm running Sqoop commands to import data from MS SQL with SSL enabled .
I have created the keystore and added the certificates to the keystore. I'm using Sqoop version 1.4.6-cdh5.11.2
Below is my Sqoop command :
sqoop import -Dfile.encoding=UTF-8
--driver "com.microsoft.sqlserver.jdbc.SQLServerDriver"
--connect "jdbc:sqlserver://xxxxxx:1433;databaseName=xxx-example;encrypt=True;TrustServerCertificate=False;hostNameInCertificate=xxxxxx.xxxx.net;trustStore=/home/user1/trust.jks;trustStorePassword=xxxx"
--username User1
--password 'xxxxx'
--null-string '\\N'
--null-non-string '\\N' -delete-target-dir
--target-dir "/home/john/PROGRAMS"
--table programs
--fields-terminated-by "\001"
--hive-drop-import-delims
--split-by 'ID'
--outdir 'temp/john/tables'
--bindir '/usr/john/PROGRAMS' -m 1
I've set the encryptions as true , TrustServerCertificate as false , trustStore= and trustStorePassword as the keystore password.
encrypt=True;TrustServerCertificate=False
Below is the error that i get while running the sqoop command :
18/01/22 10:12:10 INFO mapreduce.Job: Task Id : attempt_1516292804343_13212_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty".
at org.apache.sqoop.mapreduce.db.DBInputFormat.setDbConf(DBInputFormat.java:170)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:161)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty".
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:223)
at org.apache.sqoop.mapreduce.db.DBInputFormat.setDbConf(DBInputFormat.java:168)
... 10 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error: "java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty".
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1667)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1668)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:1323)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:991)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:827)
at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:1012)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getConnection(DBConfiguration.java:302)
at org.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:216)
... 11 more
Caused by: javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1906)
at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1889)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1410)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1618)
How can i solve this issue ?
I've tried adding -Djavax.net.ssl.trustStore= , but it didn't work.
The Microsoft doc says, For SSL to work at all, it is absolutely necessary to have the JSSE providers configured correctly for the JRE you are using. There are two steps to this:
1) Look at the java.security file in your JRE installation
(typically found in the jre\lib\security directory). The installed
security providers are listed in that file as security.provider.x=…
where ‘x’ is the priority used. For Sun JRE installations, the first
priority provider should be Sun’s. E.g.: you should have the line
“security.provider.1=sun.security.provider.Sun” in that file. For
other JRE's, please refer to the JRE's documentation regarding their
default provider name. We recommend when using the IBM JRE to specify
the "com.ibm.jsse.IBMJSSEProvider" as the first security provider to
use.
2) Next, make sure that the classpath points to the correct JAR
files (in the jre\lib directory) for use with those providers. For
Sun, the classpath should include jsse.jar. For IBM, should include
ibmjsse.jar.
Please refer this Microsoft doc that is explained by the same error you are getting (“The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Error)
Looks like you have to pass the below java parameters in the sqoop job
-Djavax.net.ssl.trustStore=C:\MyCertificates\storeName
-Djavax.net.ssl.trustStorePassword=storePassword

sqoop to transfer data to HDFS from Teradata

sqoop to transfer data to HDFS from Teradata:
Getting error as below:
-bash-4.1$ sqoop import --connection-manager com.cloudera.sqoop.manager.DefaultManagerFactory --driver com.teradata.jdbc.TeraDriver \
--connect jdbc:teradata://dwsoat.dws.company.co.uk/DATABASE=TS_72258_BASELDB \
--username userid -P --table ADDRESS --num-mappers 3 \
--target-dir /user/nathalok/ADDRESS
Warning: /apps/cloudera/parcels/CDH-5.1.3-1.cdh5.1.3.p0.12/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
14/10/29 14:00:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4-cdh5.1.3
14/10/29 14:00:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/10/29 14:00:14 ERROR sqoop.ConnFactory: Sqoop wasn't able to create connnection manager properly. Some of the connectors supports explicit --driver and some do not. Please try to either specify --driver or leave it out.
14/10/29 14:00:14 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: java.lang.NoSuchMethodException: com.cloudera.sqoop.manager.DefaultManagerFactory.(java.lang.String, com.cloudera.sqoop.SqoopOptions)
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:165)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:243)
at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:84)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:494)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
Caused by: java.lang.NoSuchMethodException: com.cloudera.sqoop.manager.DefaultManagerFactory.(java.lang.String, com.cloudera.sqoop.SqoopOptions)
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getDeclaredConstructor(Class.java:2053)
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:151)
... 9 more
-bash-4.1$
Any help will be appreciated.
To get Teradata working properly using a Cloudera distribution, you need to do the following:
Install the Teradata JDBC jars in /var/lib/sqoop. For me these were terajdbc4.jar and tdgssconfig.jar.
Install either Cloudera Connector Powered by Teradata or the Cloudera Connector for Teradata installed somewhere on your filesystem (I prefer /var/lib/sqoop).
In /etc/sqoop/conf/managers.d/, create a file (of any name) and add com.cloudera.connector.teradata.TeradataManagerFactory=<location of connector jar>. For example, I have /etc/sqoop/conf/managers.d/teradata => com.cloudera.connector.teradata.TeradataManagerFactory=/var/lib/sqoop/sqoop-connector-teradata-1.2c5.jar.
There are different ways to install the Teradata connector as well. For example, it may be easier to use Cloudera Manager.
If you're still having trouble, try reaching out to the sqoop mailing list.

How to run Sqoop with custom JDBC Driver?

I can run Sqoop without providing --driver parameter if I supply (--connect/--user/--password) for oracle thin.
But I need to get it running with custom JDBC driver (it properly implements java.sql.Driver interface) used in my project instead of oracle.jdbc.OracleDriver.
I wasn't able to get it working by simply proving it with --driver parameter.
And this suggestion wasn't helpful at all.
How to use Sqoop with custom DB access drivers?
How to overcome errors that I get?
If it has something to do with connection managers, could someone tell me what connection manager should I specify?
Thank you!
Here is what I'm actually trying to do:
./sqoop.sh import \
--fs $HDFS --jt $JT \
--connect <cutom-connection-string> --username username --password password \
--table SYS.ALL_TABLES --split-by TABLE_NAME --target-dir /temp/try/110 --verbose \
--driver xx.xx.xx.MyDriver
I get an error:
ERROR manager.SqlManager: Error executing statement:
java.sql.SQLException: ORA-00933: SQL command not properly ended
More error info:
DEBUG tool.BaseSqoopTool: Enabled debug logging.
WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
DEBUG sqoop.ConnFactory: Loaded manager factory: com.cloudera.sqoop.manager.DefaultManagerFactory
WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
INFO manager.SqlManager: Using default fetchSize of 1000
INFO tool.CodeGenTool: Beginning code generation
INFO xx.xx.xx.MyDriver: xx.xx.xx.MyDriver registered successfully.
DEBUG manager.SqlManager: No connection paramenters specified. Using regular API for making connection.
INFO xx.xx.xx.MyDriver: Returning database connection
DEBUG manager.SqlManager: Using fetchSize for next query: 1000
INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM SYS.ALL_TABLES AS t WHERE 1=0
ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: ORA-00933: SQL command not properly ended
java.sql.SQLException: ORA-00933: SQL command not properly ended
Your custom JDBC driver is used correctly. The problem seems to be in Generic jdbc connector that is in use and that seems to be producing invalid query. You might need to also fork the build in Oracle connector by replacing the driver with your custom one.
Jarcec

Resources