sqoop can't convert long to timestamp when export data from HDFS to oracle - oracle

I have a csv file in hdfs with this format:
000000131,2020-07-22,0.0,"","",1595332359218,khf987ksdfi34
000000112,2020-07-22,0.0,"","",1595442610265,khf987ksdfi34
000000150,2020-07-22,0.0,"","",1595442610438,khf987ksdfi34
I want to export this file to oracle using sqoop like this:
sqoop export --connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=oracledb)(port=1521))(connect_data=(service_name=stgdb)))" --table CORE_ETL.DEPOSIT_TURNOVER --username xxxx --password xxxx --export-dir /tmp/merged_deposit_turnover/ --input-fields-terminated-by "," --input-lines-terminated-by '\n' --input-optionally-enclosed-by '\"' --map-column-java DATE=java.sql.Date,INSERT_TS=java.sql.Timestamp
but the process ended with this error:
Caused by: java.lang.RuntimeException: Can't parse input data: '1595332359218' at
CORE_ETL_DEPOSIT_TURNOVER.__loadFromFields(CORE_ETL_DEPOSIT_TURNOVER.java:546) at
CORE_ETL_DEPOSIT_TURNOVER.parse(CORE_ETL_DEPOSIT_TURNOVER.java:431) at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:88) ... 10 more Caused
by: java.lang.IllegalArgumentException at java.sql.Date.valueOf(Date.java:143) at
CORE_ETL_DEPOSIT_TURNOVER.__loadFromFields(CORE_ETL_DEPOSIT_TURNOVER.java:529) ... 12 more
I wonder there is a way without changing the format of data in HDFS I can export this file to oracle.
also oracle schema:

As per sqoop official documentation:
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_export_date_and_timestamp_data_types_into_oracle
while exporting data from hdfs, the Sqoop export command will fail if the data is not in the required format and the required format for the timestamp is : yyyy-mm-dd hh24:mi:ss.ff . So you will have to format the timestamps in your files to conform to the aforementioned format to export properly to Oracle.

Related

how do i access data from a dataframe using the column name

I have an oracle table with xml data stored in it (xmlType). I'm trying to sqoop it to hdfs with the below command. the xml field is getting displayed as null in the hdfs file.
sqoop import --connect jdbc:oracle:thin:#DBconnString
--username uname --password pwd
--delete-target-dir
--table sample
--map-column-java column1=String
Can anyone suggest what am I doing wrong?
It is a sqoop limitation, the xmlType is not supported.
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_supported_data_types
There is a workaround here https://issues.apache.org/jira/browse/SQOOP-2749 which is essentially convert your xmlType to clob and then map it to string using the following option
--map-column-java "XMLRECORD=String"

Sqoop Export to Oracle-Caused by: java.lang.RuntimeException: Can't parse input data: '\N'

Sqoop export to oracle fails with the below exception
Caused by: java.lang.RuntimeException: Can't parse input data: '\N'
I have null columns in HDFS.
Below is the command i used.
sqoop export --connect jdbc:oracle:thin:#XXXXXXXXXXXXX \
--username XX \
--password XXXXX \
--table XXXXXXXXXXXXXXXXXX\
--export-dir '/datalake/qa/etl/XXXXXXX/XXXXXXXXXXXX' --input-fields-terminated-by ',' --input-null-string '\\N' --input-null-non-string '\\N'
and I tried with --input-null-string "\\\\N" --input-null-non-string "\\\\N" still no luck.
The issue Is caused by the NUL character in the text.
for oracle database we dont need to mention --input-null-string for Null values which i tried in different ways thinking this is the cause for issue.
I checked the log files of the map task that was failed and found the NULL character in the string which is causing the issue.
I resolved the issue by using regexp_replace before exporting to HDFS directory using hive query
regexp_replace(regexp_replace(rtrim(A.chat_agent_text),',','.'),'\0','.')
The issue is resolved now and sqoop export is successful
Observation:
Can't parse input data: '\N' not always related to Null values of a column

sqoop export from hdfs to oracle Error

Command used:
sqoop export --connect jdbc:oracle:thin:#//xxx:1521/BDWDEV4 --username xxx --password xxx --table TW5T0 --export-dir '/data/raw/oltp/cogen/oraclexport/TW5T0/2015-08-18' -m 8 --input-fields-terminated-by '\001' --lines-terminated-by '\n' --input-escaped-by '\"' --input-optionally-enclosed-by '\"'
The destination table has columns with datatype date in oracle but as show in error it is parsing simple date as timestamp
Error:
15/09/11 06:07:12 INFO mapreduce.Job: map 0% reduce 0% 15/09/11 06:07:17 INFO mapreduce.Job: Task Id : attempt_1438142065989_99811_m_000000_0, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.RuntimeException: Can't parse input data: '2015-08-15'
at TZ401.__loadFromFields(TZ401.java:792)
at TZ401.parse(TZ401.java:645)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:202)
at TZ401.__loadFromFields(TZ401.java:709)
... 12 more
Instead of changing your data files in Hadoop, you should use the --map-column-java argument in your sqoop export.
If you have for example two DATE columns named DATE_COLUMN_1 and DATE_COLUMN_2 in your Oracle table, then you can add the following argument to your sqoop command:
--map-column-java DATE_COLUMN_1=java.sql.Date,DATE_COLUMN_2=java.sql.Date
As mentioned before, the JDBC format has to be used in your Hadoop text file. But in this case yyyy-mm-dd will work.
From
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_dates_and_times,
Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP values. Any DATE columns in an Oracle database will be imported as a TIMESTAMP in Sqoop, and Sqoop-generated code will store these values in java.sql.Timestamp fields.
When exporting data back to a database, Sqoop parses text fields as
TIMESTAMP types (with the form yyyy-mm-dd HH:MM:SS.ffffffff) even if
you expect these fields to be formatted with the JDBC date escape
format of yyyy-mm-dd. Dates exported to Oracle should be formatted as
full timestamps.
So you would need to format the dates in your files to conform to the format yyyy-mm-dd HH:MM:SS.ffffffff before exporting to Oracle.
EDIT:
Answering the comment,
There around 70 files(tables) in hdfs I need to export..So,in all
files I need to change the date from yyyy-mm-dd to yyyy-mm-dd
HH:MM:SS.ffffffff, any simple way to format it.
Well you could write an awk script to do that for you. Or else you can check if the below idea works:
Create a new temporary table TEMPIMPORT with the same structure as table TW5T0 except changing the column which has the DATE datatype to VARCHAR2
Load using Sqoop into the new temporary table TEMPIMPORT.
Run the DML below to export the data back int TW5T0 (and commit of course):
insert into tw5t0 (select [[all_your_columns_here_except_date_column]],to_date(date_column,'yyyy-mm-dd') from tempimport);
used --connection-param-file ora.porperties in export sqoop
ora.properties contains
oracle.jdbc.mapDateToTimestamp=false
Oracle drivers map oracle.sql.DATE to java.sql.Timestamp, retaining
the time information. If you still want the incorrect but 10g
compatible oracle.sql.DATE to java.sql.Date mapping, then you can get
it by setting the value of mapDateToTimestamp flag to false (default
is true).
https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxref.htm#JJDBC28920
For using with sqoop you need to add option:
--connection-param-file conn-param-file.txt
conn-param-file.txt:
oracle.jdbc.mapDateToTimestamp=false
For using with sqoop you need to add option:
--connection-param-file conn-param-file.txt
conn-param-file.txt:
oracle.jdbc.mapDateToTimestamp=false
If Hive table columns sequence doesn't match with RDBMS table columns sequence order then there is a chance of same error.
I have resolve my issue after rearrange the columns in RDBMS by creating the table again.

How to use sqoop to export the default hive delimited output?

I have a hive query:
insert override directory /x
select ...
Then I'm try to export the data with sqoop
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by 0x01 --lines-terminated-by '\n'
But this seems to fail to parse the fields according to delimiter
What am I missing?
I think the --input-fields-terminated-by 0x01 part doesn't work as expected?
I do not want to create additional tables in hive that contains the query results.
stack trace:
2013-09-24 05:39:21,705 ERROR org.apache.sqoop.mapreduce.TextExportMapper: Exception:
java.lang.NumberFormatException: For input string: "9-2"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
...
The vi view of output
16-09-2013 23^A1182^A-1^APub_X^A21782^AIT^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
16-09-2013 23^A1182^A6975^ASoMo Audience Corp^A2336143^AUS^A1^A1^A0^A0^A0^A0.2^A0.0^A0.0
16-09-2013 23^A1183^A-1^APub_UK, Inc.^A1564001^AGB^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
17-09-2013 00^A1120^A-1^APub_US^A911^A--^A181^A0^A0^A0^A0^A0.0^A0.0^A0.0
I've found the correct solution for that special character in bash
#!/bin/bash
# ... your script
hive_char=$( printf "\x01" )
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by ${hive_char} --lines-terminated-by '\n'
The problem was in correct separator recognition (nothing to do with types and schema) and that was achieved by hive_char.
Another possibility to encode this special character in linux to command-line is to type Cntr+V+A
Using
--input-fields-terminated-by '\001' --lines-terminated-by '\n'
as flags in the sqoop export command seems to do the trick for me.
So, in your example, the full command would be:
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by '\001' --lines-terminated-by '\n'
I think its the DataType mismatch with your RDBMS schema.
Try to find the column name of "9-2" value and check the datatype in RDBMS schema.
If its int or numeric then Sqoop will parse the value and insert. And as it seems "9-2" is not numeric value.
Let me know if this doesn't work.
It seems like sqoop is taking '0' as a delimiter .
You are getting an error because:-
First column in your mysql table could be varchar and second column is a number.
As per below string:-
16- 0 9-2 0 13 23^A1182^A-1^APub_X^A21782^AIT^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
Your first column parsed by sqoop is :-16-
and second column is:-9-2
So its better to specify a delimiter in quotes('0x01')
or
(Its always easy and has better control)use hive create table command as:-
create table tablename row format delimited fields terminated by '\t' as select ... and specify '\t' as delimiter in your sqoop command.

Can I use Sqoop to import data into RCFile format?

According to http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1764646
You can import data in one of two file formats: delimited text or
SequenceFiles.
But what about RCFile?
Is it possible to use Sqoop to import data from Oracle DB into HDFS in RCFile format?
If yes, how to do it?
Sqoop is currently not supporting RC files. There is a jira SQOOP-640 to add this functionality.
Step 1: Create a ORC formatted table (base) in Hive.
CREATE TABLE IF NOT EXISTS tablename (hivecolumns) STORED AS RCFILE
Step 2 : Sqoop import to this RC table using HCatalog tool.
SQOOP IMPORT
--connect sourcedburl
--username XXXX
--password XXXX
--table source_table
--hcatalog-database hivedb
--hcatalog-table tablename
[ HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored — RCFile format, text files, SequenceFiles, or ORC files.]

Resources