Sqoop export is failing. Can't parse input data:'<data>' - hadoop

When I run the sqoop export command from the terminal, it works fine. But if I run the same command from the oozie workflow it is throwing me the following error.
ror: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:122)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
Caused by: java.lang.RuntimeException: Can't parse input data: '2018-05-14,967,893,74,8863.330000000005,7617.07,1246.26'
at adjust_jazz_compare.__loadFromFields(adjust_jazz_compare.java:512)
at adjust_jazz_compare.parse(adjust_jazz_compare.java:430)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:89)
... 10 more Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:854)
at adjust_jazz_compare.__loadFromFields(adjust_jazz_compare.java:482)
... 12 more
Below is the command i am using
export --connect <jdbc> --username <user> --password <pass> --table <table> --export-dir <dir> --input-fields-terminated-by ',' --input-lines-terminated-by '\n'
Table properties in hive
hive.inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim=,, serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=3973, numRows=70, rawDataSize=3903, COLUMN_STATS_ACCURATE=true, numFiles=1, transient_lastDdlTime=1530647041}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)

The row which contains
"'2018-05-14,967,893,74,8863.330000000005,7617.07,1246.26'" is not able to parsed in sqoop.
From Hive terminal check the proper format of this row.May be some special character or some spaces are there because of that this data is not able to parsed.
Please share the Schema of table present in hive.And also your hive query.

Can you show how did you create the table? Maybe you forget about fields delimiter in Hive?
FIELDS TERMINATED BY ','
Example create table code:
CREATE TABLE IF NOT EXISTS employee (
eid int,
name String,
salary String,
destination String)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

Related

Sqoop: java.lang.Double cannot be cast to java.nio.ByteBuffer

im trying to import a table from oracle to hive and i keep getting this error.
Im executing:
sqoop import -Dmapreduce.job.queuename=XXXXXX --connect
jdbc:oracle:XXX:#//XXXXXX --username XXXX --password-file=XXXX --query
"select descripcion,zona from base.test" --mapreduce-job-name
jobSqoop-test --target-dir /data/user/hive/warehouse/base.db/test
--split-by zona --map-column-java "ZONA=Double,DESCRIPCION=String" --delete-target-dir --as-parquetfile --compression-codec=snappy --null-string '\N' --null-non-string '\N' --num-mappers 1 --hive-import --hive-overwrite --hive-database base --hive-table test --direct
Error: java.lang.ClassCastException: java.lang.Double cannot be cast to java.nio.ByteBuffer
at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:338)
at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:271)
at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:179)
at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.parquet.hadoop.HadoopParquetImportMapper.write(HadoopParquetImportMapper.java:61)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:72)
at org.apache.sqoop.mapreduce.ParquetImportMapper.map(ParquetImportMapper.java:38)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Any ideas?
Thnx
It fix using
Dsqoop.parquet.logical_types.decimal.enable=false

Sqoop2 Hbase Import:Could not insert row with null value for row-key column

I am trying to migrate the data from oracle to Hbase with sqoop job.It looks like that it has successfully exported but it throws an error while importing the same in Hbase.
Job1:
​ `​sqoop import --verbose --connect *** --username *** --password *** --table 'abc' --columns "MID,EID,RTIMESTAMP,VALUE,UTIMESTAMP" --split-by 'abc.ID' --hbase-table "HPVSQOOP" --column-family "cf1" --hbase-row-key MID,EID,RTIMESTAMP --num-mappers 4 --hbase-bulkload​
​where ID is the primary key in Oracle but I want my HBase row-key to be as MID_EID_RTIMESTAMP
​Map-Reduced failed by throwing an Error:
​
​INFO mapreduce.Job: Task Id : attempt_1492489711789_0014_m_000003_2,
Status : FAILED Error: java.io.IOException: Could not insert row with
null value for row-key column: MID,EID,RTIMESTAMP at
org.apache.sqoop.hbase.ToStringPutTransformer.getPutCommand(ToStringPutTransformer.java:146)
at
org.apache.sqoop.mapreduce.HBaseBulkImportMapper.map(HBaseBulkImportMapper.java:83)
​
Another Job with --query is not working with Hbase import.
​
​Job2:
​sqoop import --verbose --connect *** --username *** --password **' --query "select MID,EID,VALUE,RTIMESTAMP,UTIMESTAMP,ID from database.abc where \$CONDITIONS" --split-by 'abc.ID' --hbase-table "HPVSQOOP" --column-family "cf1" --hbase-row-key "MID,EID,RTIMESTAMP" --num-mappers 4 --hbase-bulkload
​ended up throwing an error:
​
ERROR sqoop.Sqoop: Got exception running Sqoop:
java.lang.NullPointerException java.lang.NullPointerException
If the column name is lowercase in your database, you should use lowercase names in the command line like this --hbase-row-key "mid,eid..."

How to pass date into shell script for a sqoop command dynamically?

Im working on a sqoop import with the following command:
#!/bin/bash
while IFS=":" read -r server dbname table; do
sqoop eval --connect jdbc:mysql://$server/$dbname --username root --password cloudera --table mydata --hive-import --hive-table dynpart --check-column id --last-value $(hive -e "select max(id) from dynpart"); --hive-partition-key 'thisday' --hive-partition-value '01-01-2016'
done<tables.txt
Im doing the partition for everyday.
Hive table:
create table dynpart(id int, name char(30), city char(30))
partitioned by(thisday char(10))
row format delimited
fields terminated by ','
stored as textfile
location '/hive/mytables'
tblproperties("comment"="partition column: thisday structure is dd-mm-yyyy");
But I don't want to give the partition value directly as I want to create a sqoop job and run it everyday. In the script, how can I pass the date value to sqoop command dynamically (format: dd/mm/yyyy) instead of giving it directly ?
Any help is appreciated.
you can use the shell command date to get it (ubuntu 14.04):
$ date +%d/%m/%Y
22/03/2017
You Can try the below Code
#!/bin/bash
DATE=$(date +"%d-%m-%y")
while IFS=":" read -r server dbname table; do
sqoop eval --connect jdbc:mysql://$server/$dbname --username root --password cloudera --table mydata --hive-import --hive-table dynpart --check-column id --last-value $(hive -e "select max(id) from dynpart"); --hive-partition-key 'thisday' --hive-partition-value $DATE
done<tables.txt
Hope this Helps

Sqoop export issue: parsing input data

I am trying to use Sqoop to load files on HDFS into Oracle database:
Here is my input data:
Input data:
100|John|Miller|3.10
200|Sam|Madden|4.0
Here is the Sqoop command:
sqoop export --connect
"jdbc:oracle:thin:username/password#//host:port/service" --password
"pass" --username "user" --export-dir "/hdfs/path/"
--input-lines-terminated-by '\n' --input-null-string '\N' --input-null-non-string '\N' --input-fields-terminated-by '|' --table "SCRATCHPAD" --columns ID,FIRST_NAME,LAST_NAME,GPA
Here is the snippet of the error message that I see. Any help with this will be greatly appreciated.
INFO mapreduce.Job: Task Id :
attempt_1469238174088_466114_m_000001_1, Status : FAILED Error:
java.io.IOException: Can't export data, please check failed map task
logs at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused
by: java.lang.RuntimeException: Can't parse input data:
'200 Sam Madden 200.20'

Sqoop Export Hive String to Oracle CLOB

I want to export STRING data from Hive to CLOB in Oracle.
Command :
sqoop export -Dsqoop.export.records.per.statement=1 --connect 'jdbc:oracle:thin:#192.168.41.67:1521:orcl' --username ILABUSER --password impetus --table ILABUSER.CDT_ORC_1 --export-dir /user/dev/db/123 --input-fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' -m 2
Exception:
Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: Could not buffer record
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:218)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:46)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:84)
... 10 more
Caused by: java.lang.CloneNotSupportedException: com.cloudera.sqoop.lib.ClobRef
at java.lang.Object.clone(Native Method)
at org.apache.sqoop.lib.LobRef.clone(LobRef.java:109)
at ILABUSER_CDT_ORC_1.clone(ILABUSER_CDT_ORC_1.java:322)
at org.apache.sqoop.mapreduce.AsyncSqlRecordWriter.write(AsyncSqlRecordWriter.java:213)
... 15 more
As a workaround I used --map-column-java tag.
I mapped Clob column (named col_clob) to String in Java.
Added below code in above command:
--map-column-java tag col_clob=String

Resources