Sqoop Export to Oracle-Caused by: java.lang.RuntimeException: Can't parse input data: '\N' - sqoop

Sqoop export to oracle fails with the below exception
Caused by: java.lang.RuntimeException: Can't parse input data: '\N'
I have null columns in HDFS.
Below is the command i used.
sqoop export --connect jdbc:oracle:thin:#XXXXXXXXXXXXX \
--username XX \
--password XXXXX \
--table XXXXXXXXXXXXXXXXXX\
--export-dir '/datalake/qa/etl/XXXXXXX/XXXXXXXXXXXX' --input-fields-terminated-by ',' --input-null-string '\\N' --input-null-non-string '\\N'
and I tried with --input-null-string "\\\\N" --input-null-non-string "\\\\N" still no luck.

The issue Is caused by the NUL character in the text.
for oracle database we dont need to mention --input-null-string for Null values which i tried in different ways thinking this is the cause for issue.
I checked the log files of the map task that was failed and found the NULL character in the string which is causing the issue.
I resolved the issue by using regexp_replace before exporting to HDFS directory using hive query
regexp_replace(regexp_replace(rtrim(A.chat_agent_text),',','.'),'\0','.')
The issue is resolved now and sqoop export is successful
Observation:
Can't parse input data: '\N' not always related to Null values of a column

Related

sqoop can't convert long to timestamp when export data from HDFS to oracle

I have a csv file in hdfs with this format:
000000131,2020-07-22,0.0,"","",1595332359218,khf987ksdfi34
000000112,2020-07-22,0.0,"","",1595442610265,khf987ksdfi34
000000150,2020-07-22,0.0,"","",1595442610438,khf987ksdfi34
I want to export this file to oracle using sqoop like this:
sqoop export --connect "jdbc:oracle:thin:#(description=(address=(protocol=tcp)(host=oracledb)(port=1521))(connect_data=(service_name=stgdb)))" --table CORE_ETL.DEPOSIT_TURNOVER --username xxxx --password xxxx --export-dir /tmp/merged_deposit_turnover/ --input-fields-terminated-by "," --input-lines-terminated-by '\n' --input-optionally-enclosed-by '\"' --map-column-java DATE=java.sql.Date,INSERT_TS=java.sql.Timestamp
but the process ended with this error:
Caused by: java.lang.RuntimeException: Can't parse input data: '1595332359218' at
CORE_ETL_DEPOSIT_TURNOVER.__loadFromFields(CORE_ETL_DEPOSIT_TURNOVER.java:546) at
CORE_ETL_DEPOSIT_TURNOVER.parse(CORE_ETL_DEPOSIT_TURNOVER.java:431) at
org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:88) ... 10 more Caused
by: java.lang.IllegalArgumentException at java.sql.Date.valueOf(Date.java:143) at
CORE_ETL_DEPOSIT_TURNOVER.__loadFromFields(CORE_ETL_DEPOSIT_TURNOVER.java:529) ... 12 more
I wonder there is a way without changing the format of data in HDFS I can export this file to oracle.
also oracle schema:
As per sqoop official documentation:
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_export_date_and_timestamp_data_types_into_oracle
while exporting data from hdfs, the Sqoop export command will fail if the data is not in the required format and the required format for the timestamp is : yyyy-mm-dd hh24:mi:ss.ff . So you will have to format the timestamps in your files to conform to the aforementioned format to export properly to Oracle.

Sqoop Support for Oracle Date Format

Using Sqoop I am importing from Oracle table into hdfs and Loading to Manage Table by giving the hdfs Path Location.Below is the sqoop Command
sqoop import \
--connect jdbcconnection \
--username user \
--password password \
--table EMPDETAILS \
--column "EMP_ID,EMP_NAME,EMP_DOB,EMP_DOJ" \
--target-dir hdfspath \
-m 1
This command executed successfully and when loading into hive table using hdfs location it is giving null for EMP_DOB(date type is Date)
create table EMP_TARGET(
empid int,
empname string,
empdob date,
empdoj timestamp)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
Location 'hdfspath';
When I execute the above query the empdob column in target hive is giving NULL but empdoj is giving Correct Value. When I checked the value in hdfs path for empdob it is 1980-01-01 00:00:00:0.
Kindly help to solve the issue.

How can I point to Sqoop to use the TAB as a delimiter?

I'm trying to get data from Hadoop to MySQL. For this aim I'm using Sqoop.
On Hadoop (HDFS) side in output () I receive key,value separated by TAB. Now I would like to put the output to DB via Sqoop by:
sqoop-export --connect jdbc:mysql://localhost/test
--username root --password pswd
--table counter
--export-dir /usr/local/hadoop/output
--input-fields-terminated-by '***TAB***'
How can I point to Sqoop in --input-fields-terminated-by line to use the TAB as a delimiter?
Use
--input-fields-terminated-by "\t"

Sqoop function '--map-column-hive' being ignored

I am trying to import a file into hive as parquet and the --map-column-hive column_name=timestamp is being ignored. The column 'column_name' is originally of type datetime in sql and it converts it into bigint in parquet. I want to convert it to timestamp format through sqoop but it is not working.
sqoop import \
--table table_name \
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver \
--connect jdbc:sqlserver://servername \
--username user --password pw \
--map-column-hive column_name=timestamp\
--as-parquetfile \
--hive-import \
--hive-table table_name -m 1
When I view the table in hive, it still shows the column with its original datatype.
I tried column_name=string and that did not work either.
I think this may be an issue with converting files to parquet but I am not sure. Does anyone have a solution to fix this?
I get no errors when running the command, it just completes the import as if the command was did not exist.
Before hive 1.2 version Timestmap support in ParquetSerde is not avabile. Only binary data type support is available in 1.1.0.
Please check the link
Please upgrade your version to 1.2 and after ,it should work.
Please check the issue log and release notes below.
https://issues.apache.org/jira/browse/HIVE-6384
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345&styleName=Text&projectId=12310843

How to use sqoop to export the default hive delimited output?

I have a hive query:
insert override directory /x
select ...
Then I'm try to export the data with sqoop
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by 0x01 --lines-terminated-by '\n'
But this seems to fail to parse the fields according to delimiter
What am I missing?
I think the --input-fields-terminated-by 0x01 part doesn't work as expected?
I do not want to create additional tables in hive that contains the query results.
stack trace:
2013-09-24 05:39:21,705 ERROR org.apache.sqoop.mapreduce.TextExportMapper: Exception:
java.lang.NumberFormatException: For input string: "9-2"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:458)
...
The vi view of output
16-09-2013 23^A1182^A-1^APub_X^A21782^AIT^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
16-09-2013 23^A1182^A6975^ASoMo Audience Corp^A2336143^AUS^A1^A1^A0^A0^A0^A0.2^A0.0^A0.0
16-09-2013 23^A1183^A-1^APub_UK, Inc.^A1564001^AGB^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
17-09-2013 00^A1120^A-1^APub_US^A911^A--^A181^A0^A0^A0^A0^A0.0^A0.0^A0.0
I've found the correct solution for that special character in bash
#!/bin/bash
# ... your script
hive_char=$( printf "\x01" )
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by ${hive_char} --lines-terminated-by '\n'
The problem was in correct separator recognition (nothing to do with types and schema) and that was achieved by hive_char.
Another possibility to encode this special character in linux to command-line is to type Cntr+V+A
Using
--input-fields-terminated-by '\001' --lines-terminated-by '\n'
as flags in the sqoop export command seems to do the trick for me.
So, in your example, the full command would be:
sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by '\001' --lines-terminated-by '\n'
I think its the DataType mismatch with your RDBMS schema.
Try to find the column name of "9-2" value and check the datatype in RDBMS schema.
If its int or numeric then Sqoop will parse the value and insert. And as it seems "9-2" is not numeric value.
Let me know if this doesn't work.
It seems like sqoop is taking '0' as a delimiter .
You are getting an error because:-
First column in your mysql table could be varchar and second column is a number.
As per below string:-
16- 0 9-2 0 13 23^A1182^A-1^APub_X^A21782^AIT^A1^A0^A0^A0^A0^A0.0^A0.0^A0.0
Your first column parsed by sqoop is :-16-
and second column is:-9-2
So its better to specify a delimiter in quotes('0x01')
or
(Its always easy and has better control)use hive create table command as:-
create table tablename row format delimited fields terminated by '\t' as select ... and specify '\t' as delimiter in your sqoop command.

Resources