Sqoop: java.lang.Double cannot be cast to java.nio.ByteBuffer - hadoop

im trying to import a table from oracle to hive and i keep getting this error.
Im executing:
sqoop import -Dmapreduce.job.queuename=XXXXXX --connect
jdbc:oracle:XXX:#//XXXXXX --username XXXX --password-file=XXXX --query
"select descripcion,zona from base.test" --mapreduce-job-name
jobSqoop-test --target-dir /data/user/hive/warehouse/base.db/test
--split-by zona --map-column-java "ZONA=Double,DESCRIPCION=String" --delete-target-dir --as-parquetfile --compression-codec=snappy --null-string '\N' --null-non-string '\N' --num-mappers 1 --hive-import --hive-overwrite --hive-database base --hive-table test --direct
Error: java.lang.ClassCastException: java.lang.Double cannot be cast to java.nio.ByteBuffer
at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:338)
at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:271)
at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:179)
at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.parquet.hadoop.HadoopParquetImportMapper.write(HadoopParquetImportMapper.java:61)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:72)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:38)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Any ideas?
Thnx

It fix using
Dsqoop.parquet.logical_types.decimal.enable=false

Related

Sqoop Hive Import not support alphanumeric (plus '_')

I would like to import data from Oracle to Hive by using Sqoop as Parquet file.
I have been trying to import data using sqoop using the following command:
sqoop import --as-parquetfile --connect jdbc:oracle:thin:#10.222.14.11:1521/eservice --username MOJETL --password-file file:///home/$(whoami)/MOJ_Analytic/moj_analytic/conf/.djoppassword --query 'SELECT * FROM CMST_OFFENSE_RECORD_FAMILY WHERE $CONDITIONS' --fields-terminated-by ',' --escaped-by ',' --hive-overwrite --hive-import --hive-database default --hive-table tmp3_cmst_offense_record_family --hive-partition-key load_dt --hive-partition-value '20200213' --split-by cmst_offense_record_family_ref --target-dir hdfs://nameservice1:8020/landing/tmp3_cmst_offense_record_family/load_dt=20200213
I get the following error:
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name default.tmp3_cmst_offense_record_family is not alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name default.tmp3_cmst_offense_record_family is not alphanumeric (plus '_')
I've tried to remove
sqoop import --as-parquetfile --connect jdbc:oracle:thin:#10.222.14.11:1521/eservice --username MOJETL --password-file file:///home/$(whoami)/MOJ_Analytic/moj_analytic/conf/.djoppassword --query 'SELECT * FROM CMST_OFFENSE_RECORD_FAMILY WHERE $CONDITIONS' --fields-terminated-by ',' --escaped-by ',' --split-by cmst_offense_record_family_ref --target-dir hdfs://nameservice1:8020/landing/tmp3_cmst_offense_record_family/load_dt=20200213
I still got the same error.
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name load_dt=20200213 is not alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name load_dt=20200213 is not alphanumeric (plus '_')
Please try rewriting this part:
--hive-table default.tmp3_cmst_offense_record_family
with this one:
--hive-table tmp3_cmst_offense_record_family
You already specified the database name with the clause --hive-database

Sqoop import from SybaseIQ to Hive - java.io.IOException: SQLException in nextKeyValue

When I am trying to import a table to Hive, I am getting a strange error.
Query:
sqoop import --connect 'jdbc:sybase:Tds:10.100.*.***:5500/DATABASE=****' --driver 'com.sybase.jdbc3.jdbc.SybDriver' --username "****" --password "***" --table dw.dm_court_courttype --direct -m 1 --hive-import --create-hive-table --hive-table DM_court_courtcype --target-dir "/user/hive/warehouse/DM_Court_CourtType" --verbose
Error:
java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:565)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.sybase.jdbc3.jdbc.SybSQLException: SQL Anywhere Error -131: Syntax error near '.' on line 1
at com.sybase.jdbc3.tds.Tds.a(Unknown Source)
at com.sybase.jdbc3.tds.Tds.nextResult(Unknown Source)
at com.sybase.jdbc3.tds.Tds.getResultSetResult(Unknown Source)
at com.sybase.jdbc3.tds.TdsCursor.open(Unknown Source)
at com.sybase.jdbc3.jdbc.SybStatement.executeQuery(Unknown Source)
at com.sybase.jdbc3.jdbc.SybPreparedStatement.executeQuery(Unknown Source)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
... 12 more
Don't use database name with table name.
Use --table dm_court_courttype instead of
--table dw.dm_court_courttype
Try this:
sqoop import --connect 'jdbc:sybase:Tds:10.100..:5500/DATABASE=****' --driver 'com.sybase.jdbc3.jdbc.SybDriver' --username "****" --password "*" --table dm_court_courttype --direct -m 1 --hive-import --create-hive-table --hive-table DM_court_courtcype --target-dir "/user/hive/warehouse/DM_Court_CourtType" --verbose

sqoop 1.4.4, from oracle to hive, some field in lower-case, ERROR

I am using the sqoop 1.4.4 to transfer data from oracle to hive, using sentence:
sqoop job --create vjbkeufwekdfas -- import --split-by "Birthdate"
--check-column "Birthdate" --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --null-string ''
it doesn't work, cause the sqoop use checking strategy in the tool.ImportTool
16/10/08 14:58:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f068e5d884f3929b6415cd8318085fea/vjbkeufwekdfas.jar
16/10/08 14:58:03 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:58:03 ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
16/10/08 14:58:03 ERROR util.SqlTypeMap: exist in the table. Please ensure that you've specified
16/10/08 14:58:03 ERROR util.SqlTypeMap: correct column names in Sqoop options.
16/10/08 14:58:03 ERROR tool.ImportTool: Imported Failed: column not found: "Birthdate"
However, if I don't use the double quotation marks:
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
null-string ''
The oracle gonna throw error, because the field has some lower-case letters:
16/10/08 14:37:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/1f05a7e94340dd92c9e3b11db1a5db46/vjbkeufwekdfas.jar
16/10/08 14:37:16 INFO manager.SqlManager: Executing SQL statement: SELECT "NAME","ID","Birthdate" FROM GA_TESTER1.gmy_table1 where (1 = 0)
16/10/08 14:37:16 INFO tool.ImportTool: Incremental import based on column Birthdate
16/10/08 14:37:16 INFO tool.ImportTool: Upper bound value: TO_DATE('2016-10-08 14:37:20', 'YYYY-MM-DD HH24:MI:SS')
16/10/08 14:37:16 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/08 14:37:16 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/08 14:37:17 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/08 14:37:17 INFO client.RMProxy: Connecting to ResourceManager at master.huacloud.test/172.16.50.21:8032
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: number of splits:1
16/10/08 14:37:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1474871676561_71208
16/10/08 14:37:25 INFO impl.YarnClientImpl: Submitted application application_1474871676561_71208
16/10/08 14:37:26 INFO mapreduce.Job: The url to track the job: http://master.huacloud.test:8088/proxy/application_1474871676561_71208/
16/10/08 14:37:26 INFO mapreduce.Job: Running job: job_1474871676561_71208
16/10/08 14:37:33 INFO mapreduce.Job: Job job_1474871676561_71208 running in uber mode : false
16/10/08 14:37:33 INFO mapreduce.Job: map 0% reduce 0%
16/10/08 14:37:38 INFO mapreduce.Job: Task Id : attempt_1474871676561_71208_m_000000_0, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:266)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.sql.SQLSyntaxErrorException: ORA-00904: "BIRTHDATE": invalid identifier
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:439)
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:395)
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:802)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:436)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:186)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:521)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:205)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:861)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1145)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1267)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3449)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3493)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1491)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:237)
... 12 more
I am so confused, any help?
I think, It could be a case-sensitivity problem. In general, tables and columns are not case sensitive, but they will be if you use with quotation marks.
Try the following,
sqoop job --create vjbkeufwekdfas -- import --split-by Birthdate
--check-column Birthdate --hive-database chinacloud --hive-table hive_vjbkeufwekdfas --target-dir /tmp/vjbkeufwekdfas --incremental lastmodified --username GA_TESTER1 --password 123456 --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" --m 1 --class-name vjbkeufwekdfas --hive-import
--fields-terminated-by '^X' --hive-drop-import-delims --null-non-string '' --
If it’s still not working, Try the ‘eval’ first and make sure the query is working fine
sqoop eval --connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL --query "SELECT \”NAME\”,\”ID\”,\”BIRTHDATE\” FROM GA_TESTER1.gmy_table1 where \$CONDITIONS"
As per sqoop user guide
" While the Hadoop generic arguments must precede any import arguments, you can type the import arguments in any order with respect to one another."
Please check your argument sequence.
Please try below..
sqoop job --create vjbkeufwekdfas \
-- import \
--connect jdbc:oracle:thin:#172.16.50.12:1521:ORCL \
--username GA_TESTER1 \
--password 123456 \
--query "SELECT \"Name\",\"ID\",\"Birthdate\" FROM GA_TESTER1.gmy_table1 where \$CONDITIONS" \
--target-dir /tmp/vjbkeufwekdfas \
--split-by "Birthdate" \
--check-column "Birthdate" \
--incremental lastmodified \
--hive-import \
--hive-database chinacloud \
--hive-table hive_vjbkeufwekdfas \
--class-name vjbkeufwekdfas \
--fields-terminated-by '^X' \
--hive-drop-import-delims \
--null-non-string '' \
--null-string '' \
--m 1
use
--append \
--last-value "0001-01-01 01:01:01" \

Sqoop Export Hive String to Oracle CLOB

I want to export STRING data from Hive to CLOB in Oracle.
Command :
sqoop export -Dsqoop.export.records.per.statement=1 --connect 'jdbc:oracle:thin:#192.168.41.67:1521:orcl' --username ILABUSER --password impetus --table ILABUSER.CDT_ORC_1 --export-dir /user/dev/db/123 --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' -m 2
Exception:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Could not buffer record
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:218)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.ClobRef
at java.lang.Object.clone(Native Method)
at org.apache.sqoop.lib.LobRef.clone(LobRef.java:109)
at ILABUSER_CDT_ORC_1.clone(ILABUSER_CDT_ORC_1.java:322)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:213)
... 15 more
As a workaround I used --map-column-java tag.
I mapped Clob column (named col_clob) to String in Java.
Added below code in above command:
--map-column-java tag col_clob=String

Got exception running Sqoop: java.lang.NullPointerException using -query and --as-parquetfile

I am trying to import a table data from Redshift to HDFS (using Parquet format) and facing the error shown below:
15/06/25 11:05:42 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Command line query used:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" --connect
"jdbc:postgresql://:5439/events" --username "username"
--password "password" --query "SELECT * FROM mobile_og.pages WHERE \$CONDITIONS" --split-by anonymous_id --target-dir
/user/huser/pq_mobile_og_pages_2 --as-parquetfile.
It works fine when --as-parquetfile option is removed from the above command line query.
It is confirmed to be a bug SQOOP-2571.
If you want to import all data of a table, then you can eventually run such a command:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" \
--connect "jdbc:postgresql://:5439/events" \
--username "username" --password "password" \
--table mobile_og.pages \
--split-by anonymous_id \
--target-dir /user/huser/pq_mobile_og_pages_2 \
--as-parquetfile
And --where is also an useful parameter. Check the user manual.

Resources