Got exception running Sqoop: java.lang.NullPointerException using -query and --as-parquetfile - hadoop

I am trying to import a table data from Redshift to HDFS (using Parquet format) and facing the error shown below:
15/06/25 11:05:42 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Command line query used:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" --connect
"jdbc:postgresql://:5439/events" --username "username"
--password "password" --query "SELECT * FROM mobile_og.pages WHERE \$CONDITIONS" --split-by anonymous_id --target-dir
/user/huser/pq_mobile_og_pages_2 --as-parquetfile.
It works fine when --as-parquetfile option is removed from the above command line query.

It is confirmed to be a bug SQOOP-2571.
If you want to import all data of a table, then you can eventually run such a command:
sqoop import --driver "com.amazon.redshift.jdbc41.Driver" \
--connect "jdbc:postgresql://:5439/events" \
--username "username" --password "password" \
--table mobile_og.pages \
--split-by anonymous_id \
--target-dir /user/huser/pq_mobile_og_pages_2 \
--as-parquetfile
And --where is also an useful parameter. Check the user manual.

Related

Sqoop: java.lang.Double cannot be cast to java.nio.ByteBuffer

im trying to import a table from oracle to hive and i keep getting this error.
Im executing:
sqoop import -Dmapreduce.job.queuename=XXXXXX --connect
jdbc:oracle:XXX:#//XXXXXX --username XXXX --password-file=XXXX --query
"select descripcion,zona from base.test" --mapreduce-job-name
jobSqoop-test --target-dir /data/user/hive/warehouse/base.db/test
--split-by zona --map-column-java "ZONA=Double,DESCRIPCION=String" --delete-target-dir --as-parquetfile --compression-codec=snappy --null-string '\N' --null-non-string '\N' --num-mappers 1 --hive-import --hive-overwrite --hive-database base --hive-table test --direct
Error: java.lang.ClassCastException: java.lang.Double cannot be cast to java.nio.ByteBuffer
at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:338)
at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:271)
at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:179)
at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.parquet.hadoop.HadoopParquetImportMapper.write(HadoopParquetImportMapper.java:61)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:72)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:38)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Any ideas?
Thnx
It fix using
Dsqoop.parquet.logical_types.decimal.enable=false

Sqoop Hive Import not support alphanumeric (plus '_')

I would like to import data from Oracle to Hive by using Sqoop as Parquet file.
I have been trying to import data using sqoop using the following command:
sqoop import --as-parquetfile --connect jdbc:oracle:thin:#10.222.14.11:1521/eservice --username MOJETL --password-file file:///home/$(whoami)/MOJ_Analytic/moj_analytic/conf/.djoppassword --query 'SELECT * FROM CMST_OFFENSE_RECORD_FAMILY WHERE $CONDITIONS' --fields-terminated-by ',' --escaped-by ',' --hive-overwrite --hive-import --hive-database default --hive-table tmp3_cmst_offense_record_family --hive-partition-key load_dt --hive-partition-value '20200213' --split-by cmst_offense_record_family_ref --target-dir hdfs://nameservice1:8020/landing/tmp3_cmst_offense_record_family/load_dt=20200213
I get the following error:
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name default.tmp3_cmst_offense_record_family is not alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name default.tmp3_cmst_offense_record_family is not alphanumeric (plus '_')
I've tried to remove
sqoop import --as-parquetfile --connect jdbc:oracle:thin:#10.222.14.11:1521/eservice --username MOJETL --password-file file:///home/$(whoami)/MOJ_Analytic/moj_analytic/conf/.djoppassword --query 'SELECT * FROM CMST_OFFENSE_RECORD_FAMILY WHERE $CONDITIONS' --fields-terminated-by ',' --escaped-by ',' --split-by cmst_offense_record_family_ref --target-dir hdfs://nameservice1:8020/landing/tmp3_cmst_offense_record_family/load_dt=20200213
I still got the same error.
ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name load_dt=20200213 is not alphanumeric (plus '_')
org.kitesdk.data.ValidationException: Dataset name load_dt=20200213 is not alphanumeric (plus '_')
Please try rewriting this part:
--hive-table default.tmp3_cmst_offense_record_family
with this one:
--hive-table tmp3_cmst_offense_record_family
You already specified the database name with the clause --hive-database

How to pass column names having spaces to sqoop --map-column-java

I have to import data using sqoop, my source column names are having spaces in between them, so while I am adding it in --map-column-java parameter getting the error.
Sample Sqoop import:
sqoop import --connect jdbc-con --username "user1" --query "select * from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java data col1=String, data col2=String, data col3=String --as-avrodatafile
Column names:
data col1,
data col2,
data col3
Error:
19/03/07 07:31:55 DEBUG sqoop.Sqoop: Malformed mapping. Column mapping should be the form key=value[,key=value]*
java.lang.IllegalArgumentException: Malformed mapping. Column mapping should be the form key=value[,key=value]*
at org.apache.sqoop.SqoopOptions.parseColumnMapping(SqoopOptions.java:1355)
at org.apache.sqoop.SqoopOptions.setMapColumnJava(SqoopOptions.java:1375)
at org.apache.sqoop.tool.BaseSqoopTool.applyCodeGenOptions(BaseSqoopTool.java:1363)
at org.apache.sqoop.tool.ImportTool.applyOptions(ImportTool.java:1011)
at org.apache.sqoop.tool.SqoopTool.parseArguments(SqoopTool.java:435)
at org.apache.sqoop.Sqoop.run(Sqoop.java:135)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Malformed mapping. Column mapping should be the form key=value[,key=value]*
Able to resolve this issue:
1. Spaces issue:
sqoop import --connect jdbc-con --username "user1" --query "select * from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java "data col1=String, data col2=String, data col3=String" --as-avrodatafile
2. ERROR tool.ImportTool: Import failed: Cannot convert SQL type 2005:
3 columns in source are having 2005 and nvarchar added them in --map-column-java resolved this issue
3. org.apache.avro.file.DataFileWriter$AppendWriteException: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 1****
This is causing due to using * in select query, so modified sqoop query as:
sqoop import --connect jdbc-con --username "user1" --query "select [col1,data col2,data col3] from table where \$CONDITIONS" --target-dir /target/path/ -m 1 --map-column-java "data col1=String, data col2=String, data col3=String" --as-avrodatafile
Instead of using you can use this one method
I have used it and it works
here I am casting the columns to string so that timestamp could not change to int
keep note of that point It will help you to make your string properly
address = <localhost/server-ip-address/>
port = put your database port number
Sqoop is expecting the comma-separated list of mapping in form 'name of column'='new type'
columns-name = give your database column name of timestamp or date time to date
database-name = give your datbase name
database-user-name = put your user name
password = put your password
demo to understand the code properly
sqoop import --map-column-java "columns-name=String" --connect jdbc:postgresql://address:port/database-name --username user-name --password database-password --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
demo of code for single-column
sqoop import --map-column-java "date_of_birth=String" --connect jdbc:postgresql://192.168.0.1:1928/alpha --username postgres --password mysecretpass --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'
demo of code for dealing with multiple columns
sqoop import --map-column-java "date_of_birth=String,create_date=String" --connect jdbc:postgresql://192.168.0.1:1928/alpha --username postgres --password mysecretpass --query "select * from demo where \$CONDITIONS;" -m 1 --target-dir /jdbc/star --as-parquetfile --enclosed-by '\"'

Import BLOB (Image) from oracle to hive

I am trying to import BLOB(Image)data form oracle to Hive using below Sqoop command.
sqoop import --connect jdbc:oracle:thin:#host --username --password --m 3 --table tablename --hive-drop-import-delims --hive-table tablename --target-dir '' --split-by id;
But unsuccessful. Remember, BLOB Data stored in oracle database as Hexadecimal and we need to store this to Hive table as text or bianary.
What are the possible way to do that?
Sqoop does not know how to map blob datatype in oracle into Hive. So You need to specify --map-column-hive COLUMN_BLOB=binary
sqoop import --connect 'jdbc:oracle:thin:#host' --username $USER --password $Password --table $TABLE --hive-import --hive-table $HiveTable --map-column-hive COL_BLOB=binary --delete-target-dir --target-dir $TargetDir -m 1 -verbose

sqoop import from vertica failed

I am trying to import dataset from Vertica to HDFS using sqoop2.
I a running following query on sqoop machines to import data into hdfs from Vertica v6.0.1-7
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://10.10.10.10:5433/MYDB" --password dbpassword --username dbusername --target-dir "/user/my/hdfs/dir" --verbose --query 'SELECT * FROM ORDER_V2 LIMIT 10;'
but i am getting some error here,
16/02/03 10:33:17 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [SELECT * FROM ORDER_V2 LIMIT 10;] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:96)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Anyone knows how to do this, considering different username on both machines.
after adding vertica_5.1.4_jdk_5.jar to /var/lib/sqoop
and Addition of WHERE $CONDITIONS solved the issue
sqoop import -m 1 --driver com.vertica.jdbc.Driver --connect "jdbc:vertica://10.10.10.10:5433/MYDB" --password dbpassword --username dbusername --target-dir "/user/my/hdfs/dir" --verbose --query 'SELECT * FROM ORDER_V2 WHERE $CONDITIONS;'

Resources